Experiment Design for Hypotheses About How NLP Models Work

Serrano, Sofia

Experiment Design for Hypotheses About How NLP Models Work

Files

Serrano_washington_0250E_26893.pdf (2.18 MB)

Date

2024-09-09

relationships.isAuthorOf

Serrano, Sofia

Abstract

In the last few years, Natural Language Processing models have come a long way. However, for all the work that continues to report performance improvements, we still see different lines of work identifying problems and undesirable behavior in our most current modeling approaches. Why is this? We argue that this is a symptom of not putting enough emphasis on understanding our models, since knowledge of our models’ functioning expressed beyond test-set performance helps us speak to how models might generalize, as well as their limitations. To facilitate discovering this knowledge about how our models work, establishing trustworthy, precise methods for how we go about testing such hypotheses is of critical importance. Here, we focus specifically on the design of experiments for hypotheses about, or explanations of, models’ observed behavior. We discuss three projects that have examined different kinds of questions in this space. The first two demonstrate experiment design for different granularities of hypotheses about the functioning of NLP models, while the last project investigates which aspects of a particular experimental design choice can skew findings. The first two of these projects pose questions about whether a model as a whole exhibits a particular trait and whether a certain mechanism within many NLP models can be interpreted as instance-level explanations, respectively. Specifically, we first model the design of an experiment to investigate whether lexical correlations in the training data transfer to models finetuned on that data. Using the designed method, we find bias in the models reflecting that in the training data, even when that training data has been rebalanced to mitigate those biases. This offers further implications regarding the strong ability of contemporary NLP models to leverage higher-order features. The second of these projects, in contrast, investigates whether a particular component of many NLP models, the attention mechanism, functions as a descriptor of which information was most important in producing the model’s output for a particular input, finding gaps between models’ calculated attention distributions and the corresponding importance of inputs to the attention module. Meanwhile, the third project we discuss instead tests the impact of varying a key part of a common experimental design in choosing (or advocating for new) methods that explain which input information models use. In particular, for experiments that test the ability of an explainability method to recover a known ground truth about which input information must have been used to make a downstream decision, we examine the impact of the kind of known-ground-truth test sets used on such an experiment’s results. Finally, we close with a discussion of future work centered on examining the impact of other kinds of common experiment-structuring choices in this space.