Towards Large Language Models for Everyone: Instruction Following, Knowledge Retrieval and Multilingualism
| dc.contributor.advisor | Zettlemoyer, Luke | |
| dc.contributor.author | Lin, Xi | |
| dc.date.accessioned | 2024-09-09T23:06:25Z | |
| dc.date.available | 2024-09-09T23:06:25Z | |
| dc.date.issued | 2024-09-09 | |
| dc.date.submitted | 2024 | |
| dc.description | Thesis (Ph.D.)--University of Washington, 2024 | |
| dc.description.abstract | Large language models (LLMs) have significantly advanced the field of Natural Language Processing and demonstrated the potential to fuel a variety of AI applications. Nonetheless, building them in a way that maximally benefits the very wide range of everyday use cases is challenging. Firstly, LLMs are pre-trained with the next-token prediction objective, which does not align well with specific user requests. Secondly, LLMs suffer from knowledge cut-off and tend to hallucinate about long-tail facts. Lastly, popular LLMs are trained on almost exclusively English text, making it difficult for non-English speakers to adopt them. This thesis presents methodologies addressing all three challenges. We begin by studying the Instruction Meta-Learning (IML) approach, enabling LLMs to perform an array of tasks by fine-tuning them over pairs of natural language instructions and responses. Our study highlights the efficacy of scaling IML along three axes: fine-tuning task diversity, language diversity and model parameters. Next, we propose integrating LLMs with an external data store during IML (retrieval-augmented dual instruction tuning, RA-DIT). RA-DIT significantly improves LLM performance in scenarios that require access to large, external knowledge sources (e.g., answering information-seeking questions). Finally, we introduce a family of cross-lingual generative language models (XGLMs) pre-trained on a multilingual corpus exhibiting a heavy-tailed distribution. XGLMs demonstrate enhanced cross-lingual capabilities and few-shot generalization across medium- and low-resource languages. Together, these research strands provide core strategies for advancing the boundaries of LLM capabilities and paving the way towards real-world deployment. | |
| dc.embargo.terms | Open Access | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.other | Lin_washington_0250E_26641.pdf | |
| dc.identifier.uri | https://hdl.handle.net/1773/51871 | |
| dc.language.iso | en_US | |
| dc.rights | CC BY | |
| dc.subject | foundation model | |
| dc.subject | knowledge retrieval | |
| dc.subject | large language model | |
| dc.subject | multilingualism | |
| dc.subject | Computer science | |
| dc.subject | Artificial intelligence | |
| dc.subject.other | Computer science and engineering | |
| dc.title | Towards Large Language Models for Everyone: Instruction Following, Knowledge Retrieval and Multilingualism | |
| dc.type | Thesis |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Lin_washington_0250E_26641.pdf
- Size:
- 4.03 MB
- Format:
- Adobe Portable Document Format
