Statistical Methods toward Trustworthy AI: From Diagnosis to Controllability and Societal Impact

Fisher, Jillian

Statistical Methods toward Trustworthy AI: From Diagnosis to Controllability and Societal Impact

dc.contributor.advisor	Richardson, Thomas
dc.contributor.advisor	Choi, Yejin
dc.contributor.author	Fisher, Jillian
dc.date.accessioned	2026-02-05T19:41:23Z
dc.date.available	2026-02-05T19:41:23Z
dc.date.issued	2026-02-05
dc.date.submitted	2025
dc.description	Thesis (Ph.D.)--University of Washington, 2025
dc.description.abstract	This dissertation examines three core dimensions of Trustworthy AI: diagnosis, control, and societal impact, using statistical and machine learning methods. While the rapid advancement of large-scale AI has led to widespread adoption in everyday life, research into its reliability, safety, and social implications remains nascent. To address these gaps, this dissertation develops both theoretical foundations and practical methodologies for building more reliable AI systems.Part I (Diagnosis) provides finite-sample statistical and computational guarantees for influence diagnostics. Specifically, Chapter 2 introduces finite-sample statistical bounds, as well as computational complexity bounds, for influence functions and approximate maximum influence perturbations using efficient inverse-Hessian-vector product implementations. These bounds can then be used to better characterize and detect sources of bias in models ranging from generalized linear models to attention-based architectures. Part II (Control) introduces novel methods for controllable generation across different model scales and modalities. Chapter 3 develops an unsupervised, inference-time approach for the controllable generation task, authorship obfuscation, in small language models. Chapter 4 proposes an adaptive, interpretable framework for medium-sized models, supported by a newly created large-scale, multi-style dataset. Chapter 5 extends controllability techniques to vision-language models, presenting a lightweight self-improvement framework that enables iterative critique and revision without external supervision. Part III (Societal Impact) investigates the downstream consequences of AI bias on users. Chapter 6 presents interactive experiments showing that partisan bias in large language models can meaningfully influence political opinions and decision-making. Chapter 7 discusses the impossibility of political neutrality in AI and instead formalizes approximations, introduces techniques for achieving it at multiple conceptual levels, and evaluates contemporary models under this framework. Together, these contributions advance the study of Trustworthy AI by unifying statistical rigor with practical experimentation. The work not only strengthens our ability to diagnose and control AI behavior but also exposes its societal risks and outlines concrete pathways toward mitigating them.
dc.embargo.terms	Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	Fisher_washington_0250E_29137.pdf
dc.identifier.uri	https://hdl.handle.net/1773/55308
dc.language.iso	en_US
dc.rights	CC BY-NC-SA
dc.subject	Artificial Intelligence
dc.subject	Machine Learning
dc.subject	Societal Impact
dc.subject	Statistics
dc.subject	Computer science
dc.subject.other	Statistics
dc.title	Statistical Methods toward Trustworthy AI: From Diagnosis to Controllability and Societal Impact
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Fisher_washington_0250E_29137.pdf
Size:: 51.63 MB
Format:: Adobe Portable Document Format

Download

Collections

Statistics