Automatic Detection Of Language Levels in L2 English Learners
Abstract
This study analyzes different features which would enable classifiers to detect language levels in adult second language (L2) English Learners. Approximately 46 different speech samples from users speaking 15 different native or L1 languages were selected from the Learning Prosody in a Foreign Language (LeaP) corpus (Gut 2004) collected in Germany. Using a variety of selected features from the spoken L2 (second language English) languages, the Support Vector Machine (SVM), was trained and the speakers were classified into three different categories: c1, c2, and s1. These categories correspond to beginner, intermediate, and advanced levels of the target secondary or L2 language, English. The chosen features are grouped into four different categories: sentence, syllable, duration, and pitch. Count features such as sentence word count, sentence article count, etc. had the most influence on the system, while the sentence features had the second most influence. The duration features pushed the accuracy numbers into the 60s. Surprisingly, most of the pitch features used had no effect on the accuracy. A small common stop word list was also used, which proved to be very helpful. The edit distance measures of the sentences with common words removed showed a measurable effect, and the spoken duration of those same words in the sentence helped push the accuracy numbers for the test configuration above 60%. The test configuration was selected because it had an accuracy rating close to the mean of a set of 50 randomly generated configurations. Due to the small size of the training and testing sets, it was found the L1 language of the speaker had a significant effect on the accuracy of the classification predictions. The classification predictions have a variance as much as 40%.
Collections
- Linguistics [143]