Multilingual Language Models: Analysis and Algorithms

dc.contributor.advisorZettlemoyer, Luke
dc.contributor.authorBlevins, Terra
dc.date.accessioned2024-10-16T03:12:02Z
dc.date.available2024-10-16T03:12:02Z
dc.date.issued2024-10-16
dc.date.submitted2024
dc.descriptionThesis (Ph.D.)--University of Washington, 2024
dc.description.abstractWhile large language models (LLMs) continue to grow in scale and gain new zero-shot capabilities, their performance for languages beyond English increasingly lags behind. This gap is due to the curse of multilinguality, where multilingual language models perform worse on individual languages than a monolingual model trained on that language due to inter-language competition for representation. These issues are further compounded by the disparate amounts and qualities of training data for different languages, leading to increasingly degraded performance on lower-resource languages. However, because training new large language models for individual languages is compute- and data-intensive, multilingual language models remain the de facto approach for most of the world's languages. Therefore, it remains an open question as to how we can alleviate the curse of multilinguality and build multilingual models that fairly model many languages. This dissertation investigates how current language models do and don't capture multiple languages and examines how multilingual language models differ from monolingual ones. We first present an analysis method, structural probing, used for many of this work's analyses. Then, we examine the unexpected ability of monolingual language models to exhibit cross-lingual behavior, finding that this phenomenon is due to inherent language contamination of pretraining data collected at scale. This shows that LMs can learn languages from surprisingly small subsets of their training data and implies that all language models are multilingual when trained at scale. We next characterize the pretraining dynamics of multilingual language models, showing that while multilingual models learn information about individual languages early on, cross-lingual transfer is acquired throughout the pretraining process. This analysis also demonstrates the curse of multilinguality as it develops during pretraining, causing the model to forget previously learned information. Inspired by these insights, we propose a sparse language modeling approach for training Cross-Lingual Expert Language Models (X-ELM) to explicitly allocate parameters to different languages and reduce inter-language competition for model capacity. X-ELMs improve performance for all languages we consider, as well as provide efficiency and model adaptation benefits over prior methods. Due to these characteristics, X-ELM increases access to multilingual NLP by providing better-performing and more usable models for all languages.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherBlevins_washington_0250E_27285.pdf
dc.identifier.urihttps://hdl.handle.net/1773/52472
dc.language.isoen_US
dc.rightsCC BY-NC
dc.subjectComputer science
dc.subjectLinguistics
dc.subject.otherComputer science and engineering
dc.titleMultilingual Language Models: Analysis and Algorithms
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Blevins_washington_0250E_27285.pdf
Size:
12.31 MB
Format:
Adobe Portable Document Format