Automatic Detection Of Language Levels in L2 English Learners

dc.contributor.advisorLevow, Gina-Anneen_US
dc.contributor.authorPodgornik, Stella M.en_US
dc.date.accessioned2012-08-10T20:37:34Z
dc.date.available2012-08-10T20:37:34Z
dc.date.issued2012-08-10
dc.date.submitted2012en_US
dc.descriptionThesis (Master's)--University of Washington, 2012en_US
dc.description.abstractThis study analyzes different features which would enable classifiers to detect language levels in adult second language (L2) English Learners. Approximately 46 different speech samples from users speaking 15 different native or L1 languages were selected from the Learning Prosody in a Foreign Language (LeaP) corpus (Gut 2004) collected in Germany. Using a variety of selected features from the spoken L2 (second language English) languages, the Support Vector Machine (SVM), was trained and the speakers were classified into three different categories: c1, c2, and s1. These categories correspond to beginner, intermediate, and advanced levels of the target secondary or L2 language, English. The chosen features are grouped into four different categories: sentence, syllable, duration, and pitch. Count features such as sentence word count, sentence article count, etc. had the most influence on the system, while the sentence features had the second most influence. The duration features pushed the accuracy numbers into the 60s. Surprisingly, most of the pitch features used had no effect on the accuracy. A small common stop word list was also used, which proved to be very helpful. The edit distance measures of the sentences with common words removed showed a measurable effect, and the spoken duration of those same words in the sentence helped push the accuracy numbers for the test configuration above 60%. The test configuration was selected because it had an accuracy rating close to the mean of a set of 50 randomly generated configurations. Due to the small size of the training and testing sets, it was found the L1 language of the speaker had a significant effect on the accuracy of the classification predictions. The classification predictions have a variance as much as 40%.en_US
dc.embargo.termsNo embargoen_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.otherPodgornik_washington_0250O_10158.pdfen_US
dc.identifier.urihttp://hdl.handle.net/1773/20283
dc.language.isoen_USen_US
dc.rightsCopyright is held by the individual authors.en_US
dc.subjectcomputational linguistics; second language learningen_US
dc.subject.otherLinguisticsen_US
dc.subject.otherLanguageen_US
dc.subject.otherLinguisticsen_US
dc.titleAutomatic Detection Of Language Levels in L2 English Learnersen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Podgornik_washington_0250O_10158.pdf
Size:
754.37 KB
Format:
Adobe Portable Document Format

Collections