Rethinking Data Use in Large Language Models

Min, Sewon

Rethinking Data Use in Large Language Models

dc.contributor.advisor	Hajishirzi, Hannaneh
dc.contributor.advisor	Zettlemoyer, Luke
dc.contributor.author	Min, Sewon
dc.date.accessioned	2024-09-09T23:06:18Z
dc.date.available	2024-09-09T23:06:18Z
dc.date.issued	2024-09-09
dc.date.submitted	2024
dc.description	Thesis (Ph.D.)--University of Washington, 2024
dc.description.abstract	Large language models (LMs) such as ChatGPT have revolutionized natural language processing and artificial intelligence more broadly. In this thesis, I discuss my research on understanding and advancing these models, centered around how they use the very large text corpora they are trained on. First, I describe our efforts to understand how these models learn to perform new tasks after training, demonstrating that their so-called in context learning capabilities are almost entirely determined by what they learn from the training data. Next, I introduce a new class of LMs—nonparametric LMs—that repurpose this training data as a data store from which they retrieve information for improved accuracy and updatability. I describe my work on establishing the foundations of such models, including one of the first broadly used neural retrieval models and an approach that simplifies a traditional, two-stage pipeline into one. I also discuss how nonparametric models open up new avenues for responsible data use, e.g., by segregating permissive and copyrighted text and using them differently. Finally, I envision the next generation of LMs we should build, focusing on efficient scaling, improved factuality, and decentralization.
dc.embargo.terms	Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	Min_washington_0250E_27058.pdf
dc.identifier.uri	https://hdl.handle.net/1773/51864
dc.language.iso	en_US
dc.rights	CC BY-SA
dc.subject	Computer science
dc.subject.other	Computer science and engineering
dc.title	Rethinking Data Use in Large Language Models
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Min_washington_0250E_27058.pdf
Size:: 5.38 MB
Format:: Adobe Portable Document Format

Download

Collections

Computer science and engineering