Statistical Methods toward Trustworthy AI: From Diagnosis to Controllability and Societal Impact

dc.contributor.advisorRichardson, Thomas
dc.contributor.advisorChoi, Yejin
dc.contributor.authorFisher, Jillian
dc.date.accessioned2026-02-05T19:41:23Z
dc.date.available2026-02-05T19:41:23Z
dc.date.issued2026-02-05
dc.date.submitted2025
dc.descriptionThesis (Ph.D.)--University of Washington, 2025
dc.description.abstractThis dissertation examines three core dimensions of Trustworthy AI: diagnosis, control, and societal impact, using statistical and machine learning methods. While the rapid advancement of large-scale AI has led to widespread adoption in everyday life, research into its reliability, safety, and social implications remains nascent. To address these gaps, this dissertation develops both theoretical foundations and practical methodologies for building more reliable AI systems.Part I (Diagnosis) provides finite-sample statistical and computational guarantees for influence diagnostics. Specifically, Chapter 2 introduces finite-sample statistical bounds, as well as computational complexity bounds, for influence functions and approximate maximum influence perturbations using efficient inverse-Hessian-vector product implementations. These bounds can then be used to better characterize and detect sources of bias in models ranging from generalized linear models to attention-based architectures. Part II (Control) introduces novel methods for controllable generation across different model scales and modalities. Chapter 3 develops an unsupervised, inference-time approach for the controllable generation task, authorship obfuscation, in small language models. Chapter 4 proposes an adaptive, interpretable framework for medium-sized models, supported by a newly created large-scale, multi-style dataset. Chapter 5 extends controllability techniques to vision-language models, presenting a lightweight self-improvement framework that enables iterative critique and revision without external supervision. Part III (Societal Impact) investigates the downstream consequences of AI bias on users. Chapter 6 presents interactive experiments showing that partisan bias in large language models can meaningfully influence political opinions and decision-making. Chapter 7 discusses the impossibility of political neutrality in AI and instead formalizes approximations, introduces techniques for achieving it at multiple conceptual levels, and evaluates contemporary models under this framework. Together, these contributions advance the study of Trustworthy AI by unifying statistical rigor with practical experimentation. The work not only strengthens our ability to diagnose and control AI behavior but also exposes its societal risks and outlines concrete pathways toward mitigating them.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherFisher_washington_0250E_29137.pdf
dc.identifier.urihttps://hdl.handle.net/1773/55308
dc.language.isoen_US
dc.rightsCC BY-NC-SA
dc.subjectArtificial Intelligence
dc.subjectMachine Learning
dc.subjectSocietal Impact
dc.subjectStatistics
dc.subjectComputer science
dc.subject.otherStatistics
dc.titleStatistical Methods toward Trustworthy AI: From Diagnosis to Controllability and Societal Impact
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Fisher_washington_0250E_29137.pdf
Size:
51.63 MB
Format:
Adobe Portable Document Format

Collections