Interactive AI Model Debugging and Correction

dc.contributor.advisorHeer, Jeffrey JH
dc.contributor.advisorWeld, Dan DW
dc.contributor.authorWu, Tongshuang
dc.date.accessioned2022-09-23T20:44:29Z
dc.date.available2022-09-23T20:44:29Z
dc.date.issued2022-09-23
dc.date.submitted2022
dc.descriptionThesis (Ph.D.)--University of Washington, 2022
dc.description.abstractWhile the accuracy of Natural Language Processing (NLP) models has been improving, users have expectations beyond what is captured by standard performance metrics. For example, chatbot assistants should not provide inappropriate or unfair responses to certain types of inquiries, and translators should not sacrifice their support for languages with low resources for perfect performances in English. Unfortunately, existing models have various deficiencies (e.g., too sensitive to trivial input perturbations), thereby creating a gulf between “accurate models” (those that place high on leaderboards) and “successful models” (those that can support real world use cases). In this work, we argue that this gap exists primarily because we are not considering human needs sufficiently throughout the model development cycle. We present how the human perspective is missing across the model development and deployment, and address the issues by building tools to interactively help humans debug and correct models. First, the evaluation that developers conduct (e.g., holdout accuracy) does not reflect human expectations on how models should behave (e.g., models should be right for the right reasons). For experts to express their expectations regarding models, we provide them with domain-specific languages that allow grouping similar examples and performing counterfactual, what-if analyses, so that they can rigorously inspect the model on a variety of concrete capabilities. Second, the data that model developers collect for building an NLP model usually contains biases and distribution gaps, and does not reflect how humans will actually use the model. To compensate for human omissions in defining, collecting, and inspecting the intended data distribution, we build automated approaches (NLP text generators, automatic pattern mining and sampling algorithms) that can augment experts in collecting how humans use models. Third, the default interactions with deployed models do not allow end users to recover from AI errors. To make AIs more usable in downstream applications, we also design interaction strategies that help end users collaborate with deployed AIs in a transparent and controllable manner so they can detect and overwrite AI errors in real-time. Taken together, this thesis shows that when given strategies and tools to interactively massage (partition, perturb, and decompose) data throughout the machine learning model development stage, developers and end users can debug and correct AI models in a more comprehensive, less biased, transparent, and controllable way.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherWu_washington_0250E_24842.pdf
dc.identifier.urihttp://hdl.handle.net/1773/49314
dc.language.isoen_US
dc.rightsCC BY-NC
dc.subjectAI evaluation
dc.subjectHuman-AI Interaction
dc.subjectHuman-Computer Interaction
dc.subjectNatural Language Processing
dc.subjectComputer science
dc.subject.otherComputer science and engineering
dc.titleInteractive AI Model Debugging and Correction
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Wu_washington_0250E_24842.pdf
Size:
16.41 MB
Format:
Adobe Portable Document Format