Bridging the Gap: Adaptation Approaches for Under-Resourced Language Families

dc.contributor.advisorSteinert-Threlkeld, Shane
dc.contributor.authorParikh, Dwija
dc.date.accessioned2025-08-01T22:26:18Z
dc.date.issued2025-08-01
dc.date.submitted2025
dc.descriptionThesis (Master's)--University of Washington, 2025
dc.description.abstractMultilingual large language models have demonstrated remarkable success across a variety of natural language processing (NLP) tasks. However, their performance on low and under-resourced languages remains significantly limited, primarily due to disparities in data availability. This thesis investigates adaptation strategies to improve multilingual model performance on low-resource languages. Focusing on the Turkic language family, we investigate the effectiveness of adapting a pre-trained model using data from related languages. We examine the effectiveness of language-family-specific adaptation techniques, including language-adaptive pre-training (LAPT) and vocabulary specialization, and evaluate their impact on both zero-shot and few-shot scenarios. Our results highlight the potential of targeted multilingual adaptation to bridge performance gaps in low-resource settings and reinforce best practices for multilingual model adaptation.
dc.embargo.lift2027-07-22T22:26:18Z
dc.embargo.termsRestrict to UW for 2 years -- then make Open Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherParikh_washington_0250O_28645.pdf
dc.identifier.urihttps://hdl.handle.net/1773/53683
dc.language.isoen_US
dc.rightsnone
dc.subjectcomputational linguistics
dc.subjectlow-resource languages
dc.subjectLinguistics
dc.subjectComputer science
dc.subjectArtificial intelligence
dc.subject.otherLinguistics
dc.titleBridging the Gap: Adaptation Approaches for Under-Resourced Language Families
dc.typeThesis

Files

Collections