Interactive AI Model Debugging and Correction

Wu, Tongshuang

Interactive AI Model Debugging and Correction

dc.contributor.advisor	Heer, Jeffrey JH
dc.contributor.advisor	Weld, Dan DW
dc.contributor.author	Wu, Tongshuang
dc.date.accessioned	2022-09-23T20:44:29Z
dc.date.available	2022-09-23T20:44:29Z
dc.date.issued	2022-09-23
dc.date.submitted	2022
dc.description	Thesis (Ph.D.)--University of Washington, 2022
dc.description.abstract	While the accuracy of Natural Language Processing (NLP) models has been improving, users have expectations beyond what is captured by standard performance metrics. For example, chatbot assistants should not provide inappropriate or unfair responses to certain types of inquiries, and translators should not sacrifice their support for languages with low resources for perfect performances in English. Unfortunately, existing models have various deficiencies (e.g., too sensitive to trivial input perturbations), thereby creating a gulf between “accurate models” (those that place high on leaderboards) and “successful models” (those that can support real world use cases). In this work, we argue that this gap exists primarily because we are not considering human needs sufficiently throughout the model development cycle. We present how the human perspective is missing across the model development and deployment, and address the issues by building tools to interactively help humans debug and correct models. First, the evaluation that developers conduct (e.g., holdout accuracy) does not reflect human expectations on how models should behave (e.g., models should be right for the right reasons). For experts to express their expectations regarding models, we provide them with domain-specific languages that allow grouping similar examples and performing counterfactual, what-if analyses, so that they can rigorously inspect the model on a variety of concrete capabilities. Second, the data that model developers collect for building an NLP model usually contains biases and distribution gaps, and does not reflect how humans will actually use the model. To compensate for human omissions in defining, collecting, and inspecting the intended data distribution, we build automated approaches (NLP text generators, automatic pattern mining and sampling algorithms) that can augment experts in collecting how humans use models. Third, the default interactions with deployed models do not allow end users to recover from AI errors. To make AIs more usable in downstream applications, we also design interaction strategies that help end users collaborate with deployed AIs in a transparent and controllable manner so they can detect and overwrite AI errors in real-time. Taken together, this thesis shows that when given strategies and tools to interactively massage (partition, perturb, and decompose) data throughout the machine learning model development stage, developers and end users can debug and correct AI models in a more comprehensive, less biased, transparent, and controllable way.
dc.embargo.terms	Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	Wu_washington_0250E_24842.pdf
dc.identifier.uri	http://hdl.handle.net/1773/49314
dc.language.iso	en_US
dc.rights	CC BY-NC
dc.subject	AI evaluation
dc.subject	Human-AI Interaction
dc.subject	Human-Computer Interaction
dc.subject	Natural Language Processing
dc.subject	Computer science
dc.subject.other	Computer science and engineering
dc.title	Interactive AI Model Debugging and Correction
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Wu_washington_0250E_24842.pdf
Size:: 16.41 MB
Format:: Adobe Portable Document Format

Download

Collections

Computer science and engineering