Fine-tuning ASR Models for Very Low-Resource Languages: A Study on Mvskoke

Mainzinger, Julia

Fine-tuning ASR Models for Very Low-Resource Languages: A Study on Mvskoke

dc.contributor.advisor	Levow, Gina-Anne
dc.contributor.author	Mainzinger, Julia
dc.date.accessioned	2024-10-16T03:15:23Z
dc.date.available	2024-10-16T03:15:23Z
dc.date.issued	2024-10-16
dc.date.submitted	2024
dc.description	Thesis (Master's)--University of Washington, 2024
dc.description.abstract	Recent advancements in multilingual models for automatic speech recognition (ASR) have significantly improved accuracy for languages with extremely limited resources. This study focuses on ASR modeling for the Mvskoke language, an indigenous language of America, by fine-tuning three multilingual wav2vec2.0 models: XLSR-53, MMS-300M, and MMS-1B-l1107. Training data is prepared using language documentation resources, and two evaluation sets are designed, one clean and one noisy, to evaluate performance in different settings. Parameter efficiency of adapter training is compared with training entire models, as well as examining the impact of the number of languages used during pre-training. The study also investigates how performance varies with different amounts of training data, by testing models trained with 10, 60, 120, and 243 minutes of data. A trigram language model is trained using cultural documents and transcripts of interviews, and the ASR models are evaluated with and without language model decoding. The findings show that both MMS models outperform XLSR-53 with higher amounts of training data. Notably, training an adapter for the MMS-1B-l1107 proves to be both parameter-efficient and capable of achieving high accuracy with a relatively small amount of data. ASR accuracy begins to converge around 2-4 hours of training data. While using a language model generally improves metrics such as word error rate, it can sometimes degrade the output. The study introduces the first ASR models successfully developed for the Mvskoke language.
dc.embargo.terms	Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	Mainzinger_washington_0250O_27493.pdf
dc.identifier.uri	https://hdl.handle.net/1773/52546
dc.language.iso	en_US
dc.rights	CC BY-NC-SA
dc.subject	ASR
dc.subject	Endangered languages
dc.subject	Low-resource languages
dc.subject	Speech Recognition
dc.subject	Linguistics
dc.subject	Native American studies
dc.subject.other	Linguistics
dc.title	Fine-tuning ASR Models for Very Low-Resource Languages: A Study on Mvskoke
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Mainzinger_washington_0250O_27493.pdf
Size:: 960.3 KB
Format:: Adobe Portable Document Format

Download

Collections

Linguistics