Towards Large Language Models for Everyone: Instruction Following, Knowledge Retrieval and Multilingualism
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Large language models (LLMs) have significantly advanced the field of Natural Language Processing and demonstrated the potential to fuel a variety of AI applications. Nonetheless, building them in a way that maximally benefits the very wide range of everyday use cases is challenging. Firstly, LLMs are pre-trained with the next-token prediction objective, which does not align well with specific user requests. Secondly, LLMs suffer from knowledge cut-off and tend to hallucinate about long-tail facts. Lastly, popular LLMs are trained on almost exclusively English text, making it difficult for non-English speakers to adopt them. This thesis presents methodologies addressing all three challenges. We begin by studying the Instruction Meta-Learning (IML) approach, enabling LLMs to perform an array of tasks by fine-tuning them over pairs of natural language instructions and responses. Our study highlights the efficacy of scaling IML along three axes: fine-tuning task diversity, language diversity and model parameters. Next, we propose integrating LLMs with an external data store during IML (retrieval-augmented dual instruction tuning, RA-DIT). RA-DIT significantly improves LLM performance in scenarios that require access to large, external knowledge sources (e.g., answering information-seeking questions). Finally, we introduce a family of cross-lingual generative language models (XGLMs) pre-trained on a multilingual corpus exhibiting a heavy-tailed distribution. XGLMs demonstrate enhanced cross-lingual capabilities and few-shot generalization across medium- and low-resource languages. Together, these research strands provide core strategies for advancing the boundaries of LLM capabilities and paving the way towards real-world deployment.
Description
Thesis (Ph.D.)--University of Washington, 2024
