The Weighted Möbius Score: A Unified Framework for Feature Attribution

dc.contributor.advisorSteinert-Threlkeld, Shane
dc.contributor.authorJiang, Yifan
dc.date.accessioned2023-08-14T17:06:00Z
dc.date.available2023-08-14T17:06:00Z
dc.date.issued2023-08-14
dc.date.submitted2023
dc.descriptionThesis (Master's)--University of Washington, 2023
dc.description.abstractFeature attribution aims to explain the reasoning behind a black-box model's prediction by identifying the impact of each feature on the prediction. Recent work has extended feature attribution to interactions between multiple features. However, the lack of a unified framework has led to a proliferation of methods that are often not directly comparable. This thesis introduces a parameterized attribution framework---the Weighted Möbius Score---and (i) shows that many different attribution methods for both individual features and feature interactions are special cases and (ii) identifies some new methods. By studying the vector space of attribution methods, our framework utilizes standard linear algebra tools and provides interpretations in various fields, including cooperative game theory and causal mediation analysis. We empirically demonstrate the framework's versatility and effectiveness by applying these attribution methods to feature interactions in sentiment analysis and Chain-of-Thought prompting.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherJiang_washington_0250O_25540.pdf
dc.identifier.urihttp://hdl.handle.net/1773/50475
dc.language.isoen_US
dc.rightsCC BY
dc.subjectFeature Attribution
dc.subjectFeature Interaction
dc.subjectInterpretability and Explainability
dc.subjectNatural Language Processing
dc.subjectArtificial intelligence
dc.subjectComputer science
dc.subjectLinguistics
dc.subject.otherLinguistics
dc.titleThe Weighted Möbius Score: A Unified Framework for Feature Attribution
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Jiang_washington_0250O_25540.pdf
Size:
956.19 KB
Format:
Adobe Portable Document Format

Collections