Bridging the Gap: Adaptation Approaches for Under-Resourced Language Families

Parikh, Dwija

Bridging the Gap: Adaptation Approaches for Under-Resourced Language Families

Date

2025-08-01

relationships.isAuthorOf

Parikh, Dwija

Abstract

Multilingual large language models have demonstrated remarkable success across a variety of natural language processing (NLP) tasks. However, their performance on low and under-resourced languages remains significantly limited, primarily due to disparities in data availability. This thesis investigates adaptation strategies to improve multilingual model performance on low-resource languages. Focusing on the Turkic language family, we investigate the effectiveness of adapting a pre-trained model using data from related languages. We examine the effectiveness of language-family-specific adaptation techniques, including language-adaptive pre-training (LAPT) and vocabulary specialization, and evaluate their impact on both zero-shot and few-shot scenarios. Our results highlight the potential of targeted multilingual adaptation to bridge performance gaps in low-resource settings and reinforce best practices for multilingual model adaptation.