Multilingual Language Models: Analysis and Algorithms

Blevins, Terra

Multilingual Language Models: Analysis and Algorithms

dc.contributor.advisor	Zettlemoyer, Luke
dc.contributor.author	Blevins, Terra
dc.date.accessioned	2024-10-16T03:12:02Z
dc.date.available	2024-10-16T03:12:02Z
dc.date.issued	2024-10-16
dc.date.submitted	2024
dc.description	Thesis (Ph.D.)--University of Washington, 2024
dc.description.abstract	While large language models (LLMs) continue to grow in scale and gain new zero-shot capabilities, their performance for languages beyond English increasingly lags behind. This gap is due to the curse of multilinguality, where multilingual language models perform worse on individual languages than a monolingual model trained on that language due to inter-language competition for representation. These issues are further compounded by the disparate amounts and qualities of training data for different languages, leading to increasingly degraded performance on lower-resource languages. However, because training new large language models for individual languages is compute- and data-intensive, multilingual language models remain the de facto approach for most of the world's languages. Therefore, it remains an open question as to how we can alleviate the curse of multilinguality and build multilingual models that fairly model many languages. This dissertation investigates how current language models do and don't capture multiple languages and examines how multilingual language models differ from monolingual ones. We first present an analysis method, structural probing, used for many of this work's analyses. Then, we examine the unexpected ability of monolingual language models to exhibit cross-lingual behavior, finding that this phenomenon is due to inherent language contamination of pretraining data collected at scale. This shows that LMs can learn languages from surprisingly small subsets of their training data and implies that all language models are multilingual when trained at scale. We next characterize the pretraining dynamics of multilingual language models, showing that while multilingual models learn information about individual languages early on, cross-lingual transfer is acquired throughout the pretraining process. This analysis also demonstrates the curse of multilinguality as it develops during pretraining, causing the model to forget previously learned information. Inspired by these insights, we propose a sparse language modeling approach for training Cross-Lingual Expert Language Models (X-ELM) to explicitly allocate parameters to different languages and reduce inter-language competition for model capacity. X-ELMs improve performance for all languages we consider, as well as provide efficiency and model adaptation benefits over prior methods. Due to these characteristics, X-ELM increases access to multilingual NLP by providing better-performing and more usable models for all languages.
dc.embargo.terms	Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	Blevins_washington_0250E_27285.pdf
dc.identifier.uri	https://hdl.handle.net/1773/52472
dc.language.iso	en_US
dc.rights	CC BY-NC
dc.subject	Computer science
dc.subject	Linguistics
dc.subject.other	Computer science and engineering
dc.title	Multilingual Language Models: Analysis and Algorithms
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Blevins_washington_0250E_27285.pdf
Size:: 12.31 MB
Format:: Adobe Portable Document Format

Download

Collections

Computer science and engineering