Interactive AI Model Debugging and Correction
| dc.contributor.advisor | Heer, Jeffrey JH | |
| dc.contributor.advisor | Weld, Dan DW | |
| dc.contributor.author | Wu, Tongshuang | |
| dc.date.accessioned | 2022-09-23T20:44:29Z | |
| dc.date.available | 2022-09-23T20:44:29Z | |
| dc.date.issued | 2022-09-23 | |
| dc.date.submitted | 2022 | |
| dc.description | Thesis (Ph.D.)--University of Washington, 2022 | |
| dc.description.abstract | While the accuracy of Natural Language Processing (NLP) models has been improving, users have expectations beyond what is captured by standard performance metrics. For example, chatbot assistants should not provide inappropriate or unfair responses to certain types of inquiries, and translators should not sacrifice their support for languages with low resources for perfect performances in English. Unfortunately, existing models have various deficiencies (e.g., too sensitive to trivial input perturbations), thereby creating a gulf between “accurate models” (those that place high on leaderboards) and “successful models” (those that can support real world use cases). In this work, we argue that this gap exists primarily because we are not considering human needs sufficiently throughout the model development cycle. We present how the human perspective is missing across the model development and deployment, and address the issues by building tools to interactively help humans debug and correct models. First, the evaluation that developers conduct (e.g., holdout accuracy) does not reflect human expectations on how models should behave (e.g., models should be right for the right reasons). For experts to express their expectations regarding models, we provide them with domain-specific languages that allow grouping similar examples and performing counterfactual, what-if analyses, so that they can rigorously inspect the model on a variety of concrete capabilities. Second, the data that model developers collect for building an NLP model usually contains biases and distribution gaps, and does not reflect how humans will actually use the model. To compensate for human omissions in defining, collecting, and inspecting the intended data distribution, we build automated approaches (NLP text generators, automatic pattern mining and sampling algorithms) that can augment experts in collecting how humans use models. Third, the default interactions with deployed models do not allow end users to recover from AI errors. To make AIs more usable in downstream applications, we also design interaction strategies that help end users collaborate with deployed AIs in a transparent and controllable manner so they can detect and overwrite AI errors in real-time. Taken together, this thesis shows that when given strategies and tools to interactively massage (partition, perturb, and decompose) data throughout the machine learning model development stage, developers and end users can debug and correct AI models in a more comprehensive, less biased, transparent, and controllable way. | |
| dc.embargo.terms | Open Access | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.other | Wu_washington_0250E_24842.pdf | |
| dc.identifier.uri | http://hdl.handle.net/1773/49314 | |
| dc.language.iso | en_US | |
| dc.rights | CC BY-NC | |
| dc.subject | AI evaluation | |
| dc.subject | Human-AI Interaction | |
| dc.subject | Human-Computer Interaction | |
| dc.subject | Natural Language Processing | |
| dc.subject | Computer science | |
| dc.subject.other | Computer science and engineering | |
| dc.title | Interactive AI Model Debugging and Correction | |
| dc.type | Thesis |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Wu_washington_0250E_24842.pdf
- Size:
- 16.41 MB
- Format:
- Adobe Portable Document Format
