Interpretation Errors: Extracting Functionality from Generative Models of Language by Understanding Them Better

Holtzman, Ariel

Interpretation Errors: Extracting Functionality from Generative Models of Language by Understanding Them Better

Files

Holtzman_washington_0250E_26090.pdf (12.79 MB)

Date

2023-09-27

relationships.isAuthorOf

Holtzman, Ariel

Abstract

The rise of large language models as the workhorse of NLP, and the continuous release of better models (OpenAI, 2023; Pichai, 2023; Schulman et al., 2022, inter alia) has created a strange situation: we have models that are more powerful language generators than ever before, but since we did not design them for a specific purpose we struggle to understand how they should be used or what their idiosyncracies are. This dissertation describes three empirical projects that sought to characterize the underlying behavior of language models and, importantly, to make them more reliable tools for generating and selecting text where this behavior does not match up with the tasks we would like models to complete. Each project attempts to understand what language models and accompanying inference methods currently optimize for, to characterize the gap between that and the true objective of a potential user, and to close it with some new inference method. An emergent theme through these works is that models are already doing what we trained them to do quite well—and it is often the experimenters and practitioners who misunderstand precisely what we trained models to do in the first place. We conclude with a conceptual analysis of how we should study generative models going forward—as models keep improving and new, unanticipated uses and misuses become ever more available. The first half of this dissertation concerns two works, Neural Text Degeneration and Surface Form Competition—two failure modes of generative models that occur when probability is viewed as equivalent to “correctness” in text generation and multiple choice scenarios, respectively. For these works we describe the resultant issues, and propose inference methods that largely alleviate them. The second half of this dissertation goes deeper into the question of how generative models of language capture the communicative goals that humans are optimizing: first with Learning to Write, operationalizing communicative goals into auxiliary search objectives for text decoding, and then with Generative Models as a Complex Systems Science, which presents a framework to think about the study of generative models as NLP shifts to analyzing systems that are often infeasible to replicate. How does a model that is predicting the distribution of next tokens understand—and fail to understand—the structure of an essay? This is precisely the kind of question we must face head-on in the new science of generative models.