Bridging the Gap: Adaptation Approaches for Under-Resourced Language Families
| dc.contributor.advisor | Steinert-Threlkeld, Shane | |
| dc.contributor.author | Parikh, Dwija | |
| dc.date.accessioned | 2025-08-01T22:26:18Z | |
| dc.date.issued | 2025-08-01 | |
| dc.date.submitted | 2025 | |
| dc.description | Thesis (Master's)--University of Washington, 2025 | |
| dc.description.abstract | Multilingual large language models have demonstrated remarkable success across a variety of natural language processing (NLP) tasks. However, their performance on low and under-resourced languages remains significantly limited, primarily due to disparities in data availability. This thesis investigates adaptation strategies to improve multilingual model performance on low-resource languages. Focusing on the Turkic language family, we investigate the effectiveness of adapting a pre-trained model using data from related languages. We examine the effectiveness of language-family-specific adaptation techniques, including language-adaptive pre-training (LAPT) and vocabulary specialization, and evaluate their impact on both zero-shot and few-shot scenarios. Our results highlight the potential of targeted multilingual adaptation to bridge performance gaps in low-resource settings and reinforce best practices for multilingual model adaptation. | |
| dc.embargo.lift | 2027-07-22T22:26:18Z | |
| dc.embargo.terms | Restrict to UW for 2 years -- then make Open Access | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.other | Parikh_washington_0250O_28645.pdf | |
| dc.identifier.uri | https://hdl.handle.net/1773/53683 | |
| dc.language.iso | en_US | |
| dc.rights | none | |
| dc.subject | computational linguistics | |
| dc.subject | low-resource languages | |
| dc.subject | Linguistics | |
| dc.subject | Computer science | |
| dc.subject | Artificial intelligence | |
| dc.subject.other | Linguistics | |
| dc.title | Bridging the Gap: Adaptation Approaches for Under-Resourced Language Families | |
| dc.type | Thesis |
