Towards reliability and interactive debugging for large language models

dc.contributor.advisorZettlemoyer, Luke
dc.contributor.advisorHajishirzi, Hannaneh
dc.contributor.authorParanjape, Bhargavi
dc.date.accessioned2024-04-26T23:19:30Z
dc.date.available2024-04-26T23:19:30Z
dc.date.issued2024-04-26
dc.date.submitted2024
dc.descriptionThesis (Ph.D.)--University of Washington, 2024
dc.description.abstractLarge language models (LLMs) have permeated our everyday lives and are used in critical decision-making scenarios that can affect millions of people. Despite their impressive progress, model deficiencies may result in exacerbating harmful biases or lead to catastrophic failures. In this thesis, we present and advance a series of important considerations for reliable model deployment. Beyond improved accuracy on new and complex tasks, users want more transparent models that explain their predictions and are robust to data biases or distributional shifts. They also want to be equipped to interact with these models to better understand and debug them. We present a variety of training and inference techniques toward building these aspects of reliability into models. We particularly focus on techniques that address challenges of scale and lack of human supervision, for models ranging from classifiers with limited interaction potential to massive LLMs that can communicate with humans and external tools. In the first part of this thesis on advancing explainability for LLMs, we introduce a novel information-theoretic objective to train models to generate explanations that are concise, comprehensible and faithful to model predictions. We also introduce a contrastive prompt-based approach to explain model predictions on common-sense reasoning tasks, that can also be leveraged by users to probe model behavior. We focus on distributional robustness for LLMs in the second part of this thesis. We develop a novel optimization technique, that discovers error-prone data slices for users to examine, and trains a robust classifier to improve performance on rare data slices. We also develop an open-sourced framework for fine-grained attribution of hallucinations in model generated text to underlying pre-training data. In the third part, we present a framework for automatically decomposing unseen composite tasks that require multi-step reasoning and external-system interaction, and delve into how the framework supports user debugging. Overall, this thesis presents a range of optimization, inference, and evaluation methods that make progress toward better explainability, robustness, and interactive debugging of large language models.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherParanjape_washington_0250E_26573.pdf
dc.identifier.urihttp://hdl.handle.net/1773/51339
dc.language.isoen_US
dc.rightsCC BY-SA
dc.subjectinteractive agents
dc.subjectinterpretability and robustness
dc.subjectlarge language models
dc.subjectmachine learning
dc.subjectNatural language processing
dc.subjectComputer science
dc.subjectArtificial intelligence
dc.subject.otherComputer science and engineering
dc.titleTowards reliability and interactive debugging for large language models
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Paranjape_washington_0250E_26573.pdf
Size:
1.37 MB
Format:
Adobe Portable Document Format