Rethinking Data Use in Large Language Models
| dc.contributor.advisor | Hajishirzi, Hannaneh | |
| dc.contributor.advisor | Zettlemoyer, Luke | |
| dc.contributor.author | Min, Sewon | |
| dc.date.accessioned | 2024-09-09T23:06:18Z | |
| dc.date.available | 2024-09-09T23:06:18Z | |
| dc.date.issued | 2024-09-09 | |
| dc.date.submitted | 2024 | |
| dc.description | Thesis (Ph.D.)--University of Washington, 2024 | |
| dc.description.abstract | Large language models (LMs) such as ChatGPT have revolutionized natural language processing and artificial intelligence more broadly. In this thesis, I discuss my research on understanding and advancing these models, centered around how they use the very large text corpora they are trained on. First, I describe our efforts to understand how these models learn to perform new tasks after training, demonstrating that their so-called in context learning capabilities are almost entirely determined by what they learn from the training data. Next, I introduce a new class of LMs—nonparametric LMs—that repurpose this training data as a data store from which they retrieve information for improved accuracy and updatability. I describe my work on establishing the foundations of such models, including one of the first broadly used neural retrieval models and an approach that simplifies a traditional, two-stage pipeline into one. I also discuss how nonparametric models open up new avenues for responsible data use, e.g., by segregating permissive and copyrighted text and using them differently. Finally, I envision the next generation of LMs we should build, focusing on efficient scaling, improved factuality, and decentralization. | |
| dc.embargo.terms | Open Access | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.other | Min_washington_0250E_27058.pdf | |
| dc.identifier.uri | https://hdl.handle.net/1773/51864 | |
| dc.language.iso | en_US | |
| dc.rights | CC BY-SA | |
| dc.subject | Computer science | |
| dc.subject.other | Computer science and engineering | |
| dc.title | Rethinking Data Use in Large Language Models | |
| dc.type | Thesis |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Min_washington_0250E_27058.pdf
- Size:
- 5.38 MB
- Format:
- Adobe Portable Document Format
