Towards Large Language Models for Everyone: Instruction Following, Knowledge Retrieval and Multilingualism

Lin, Xi

Towards Large Language Models for Everyone: Instruction Following, Knowledge Retrieval and Multilingualism

dc.contributor.advisor	Zettlemoyer, Luke
dc.contributor.author	Lin, Xi
dc.date.accessioned	2024-09-09T23:06:25Z
dc.date.available	2024-09-09T23:06:25Z
dc.date.issued	2024-09-09
dc.date.submitted	2024
dc.description	Thesis (Ph.D.)--University of Washington, 2024
dc.description.abstract	Large language models (LLMs) have significantly advanced the field of Natural Language Processing and demonstrated the potential to fuel a variety of AI applications. Nonetheless, building them in a way that maximally benefits the very wide range of everyday use cases is challenging. Firstly, LLMs are pre-trained with the next-token prediction objective, which does not align well with specific user requests. Secondly, LLMs suffer from knowledge cut-off and tend to hallucinate about long-tail facts. Lastly, popular LLMs are trained on almost exclusively English text, making it difficult for non-English speakers to adopt them. This thesis presents methodologies addressing all three challenges. We begin by studying the Instruction Meta-Learning (IML) approach, enabling LLMs to perform an array of tasks by fine-tuning them over pairs of natural language instructions and responses. Our study highlights the efficacy of scaling IML along three axes: fine-tuning task diversity, language diversity and model parameters. Next, we propose integrating LLMs with an external data store during IML (retrieval-augmented dual instruction tuning, RA-DIT). RA-DIT significantly improves LLM performance in scenarios that require access to large, external knowledge sources (e.g., answering information-seeking questions). Finally, we introduce a family of cross-lingual generative language models (XGLMs) pre-trained on a multilingual corpus exhibiting a heavy-tailed distribution. XGLMs demonstrate enhanced cross-lingual capabilities and few-shot generalization across medium- and low-resource languages. Together, these research strands provide core strategies for advancing the boundaries of LLM capabilities and paving the way towards real-world deployment.
dc.embargo.terms	Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	Lin_washington_0250E_26641.pdf
dc.identifier.uri	https://hdl.handle.net/1773/51871
dc.language.iso	en_US
dc.rights	CC BY
dc.subject	foundation model
dc.subject	knowledge retrieval
dc.subject	large language model
dc.subject	multilingualism
dc.subject	Computer science
dc.subject	Artificial intelligence
dc.subject.other	Computer science and engineering
dc.title	Towards Large Language Models for Everyone: Instruction Following, Knowledge Retrieval and Multilingualism
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Lin_washington_0250E_26641.pdf
Size:: 4.03 MB
Format:: Adobe Portable Document Format

Download

Collections

Computer science and engineering