Fine-tuning ASR Models for Very Low-Resource Languages: A Study on Mvskoke
| dc.contributor.advisor | Levow, Gina-Anne | |
| dc.contributor.author | Mainzinger, Julia | |
| dc.date.accessioned | 2024-10-16T03:15:23Z | |
| dc.date.available | 2024-10-16T03:15:23Z | |
| dc.date.issued | 2024-10-16 | |
| dc.date.submitted | 2024 | |
| dc.description | Thesis (Master's)--University of Washington, 2024 | |
| dc.description.abstract | Recent advancements in multilingual models for automatic speech recognition (ASR) have significantly improved accuracy for languages with extremely limited resources. This study focuses on ASR modeling for the Mvskoke language, an indigenous language of America, by fine-tuning three multilingual wav2vec2.0 models: XLSR-53, MMS-300M, and MMS-1B-l1107. Training data is prepared using language documentation resources, and two evaluation sets are designed, one clean and one noisy, to evaluate performance in different settings. Parameter efficiency of adapter training is compared with training entire models, as well as examining the impact of the number of languages used during pre-training. The study also investigates how performance varies with different amounts of training data, by testing models trained with 10, 60, 120, and 243 minutes of data. A trigram language model is trained using cultural documents and transcripts of interviews, and the ASR models are evaluated with and without language model decoding. The findings show that both MMS models outperform XLSR-53 with higher amounts of training data. Notably, training an adapter for the MMS-1B-l1107 proves to be both parameter-efficient and capable of achieving high accuracy with a relatively small amount of data. ASR accuracy begins to converge around 2-4 hours of training data. While using a language model generally improves metrics such as word error rate, it can sometimes degrade the output. The study introduces the first ASR models successfully developed for the Mvskoke language. | |
| dc.embargo.terms | Open Access | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.other | Mainzinger_washington_0250O_27493.pdf | |
| dc.identifier.uri | https://hdl.handle.net/1773/52546 | |
| dc.language.iso | en_US | |
| dc.rights | CC BY-NC-SA | |
| dc.subject | ASR | |
| dc.subject | Endangered languages | |
| dc.subject | Low-resource languages | |
| dc.subject | Speech Recognition | |
| dc.subject | Linguistics | |
| dc.subject | Native American studies | |
| dc.subject.other | Linguistics | |
| dc.title | Fine-tuning ASR Models for Very Low-Resource Languages: A Study on Mvskoke | |
| dc.type | Thesis |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Mainzinger_washington_0250O_27493.pdf
- Size:
- 960.3 KB
- Format:
- Adobe Portable Document Format
