Towards reliability and interactive debugging for large language models

Paranjape, Bhargavi

Towards reliability and interactive debugging for large language models

dc.contributor.advisor	Zettlemoyer, Luke
dc.contributor.advisor	Hajishirzi, Hannaneh
dc.contributor.author	Paranjape, Bhargavi
dc.date.accessioned	2024-04-26T23:19:30Z
dc.date.available	2024-04-26T23:19:30Z
dc.date.issued	2024-04-26
dc.date.submitted	2024
dc.description	Thesis (Ph.D.)--University of Washington, 2024
dc.description.abstract	Large language models (LLMs) have permeated our everyday lives and are used in critical decision-making scenarios that can affect millions of people. Despite their impressive progress, model deficiencies may result in exacerbating harmful biases or lead to catastrophic failures. In this thesis, we present and advance a series of important considerations for reliable model deployment. Beyond improved accuracy on new and complex tasks, users want more transparent models that explain their predictions and are robust to data biases or distributional shifts. They also want to be equipped to interact with these models to better understand and debug them. We present a variety of training and inference techniques toward building these aspects of reliability into models. We particularly focus on techniques that address challenges of scale and lack of human supervision, for models ranging from classifiers with limited interaction potential to massive LLMs that can communicate with humans and external tools. In the first part of this thesis on advancing explainability for LLMs, we introduce a novel information-theoretic objective to train models to generate explanations that are concise, comprehensible and faithful to model predictions. We also introduce a contrastive prompt-based approach to explain model predictions on common-sense reasoning tasks, that can also be leveraged by users to probe model behavior. We focus on distributional robustness for LLMs in the second part of this thesis. We develop a novel optimization technique, that discovers error-prone data slices for users to examine, and trains a robust classifier to improve performance on rare data slices. We also develop an open-sourced framework for fine-grained attribution of hallucinations in model generated text to underlying pre-training data. In the third part, we present a framework for automatically decomposing unseen composite tasks that require multi-step reasoning and external-system interaction, and delve into how the framework supports user debugging. Overall, this thesis presents a range of optimization, inference, and evaluation methods that make progress toward better explainability, robustness, and interactive debugging of large language models.
dc.embargo.terms	Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	Paranjape_washington_0250E_26573.pdf
dc.identifier.uri	http://hdl.handle.net/1773/51339
dc.language.iso	en_US
dc.rights	CC BY-SA
dc.subject	interactive agents
dc.subject	interpretability and robustness
dc.subject	large language models
dc.subject	machine learning
dc.subject	Natural language processing
dc.subject	Computer science
dc.subject	Artificial intelligence
dc.subject.other	Computer science and engineering
dc.title	Towards reliability and interactive debugging for large language models
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Paranjape_washington_0250E_26573.pdf
Size:: 1.37 MB
Format:: Adobe Portable Document Format

Download

Collections

Computer science and engineering