Context in Question Answering

Ferguson, James

Context in Question Answering

Files

Ferguson_washington_0250E_24890.pdf (2.16 MB)

Date

2023-01-21

relationships.isAuthorOf

Ferguson, James

Abstract

Question Answering (QA) has seen a massive surge in interest, and a correlated improvement in modelperformance, over the past half decade, even surpassing humans on some datasets. Many models for this task rely on some form of contextual information that leads from the question to the answer. This context plays a crucial role in how well models perform. Using the right context when answering a question can be the difference between a model achieving superhuman performance, or performing little better than random. When constructing a dataset, using context incorrectly can trivialize the problem, resulting in models that do not generalize to other data. Seeing alternate contexts during training can help improve model robustness by providing multiple views of how to answer a question. We present three works that explore the significance of context in QA. First, we introduce a new dataset, IIRC, in which questions that require multi-hop, discrete reasoning are written with access to only partial context. Answers are then collected separately, resulting in much lower lexical overlap between questions and relevant context. We show that these questions are easily answerable by humans, but state-of-the-art models struggle to achieve comparable results. Next, we present a new method for selecting training context. Many questions have multiple contexts that lead to the correct answer, but exhaustively annotating these contexts is difficult. We use the downstream QA loss to identify alternate contexts during training, and show that this approach enables identifying relevant context for unseen data greater than 90% of the time on the IIRC dataset. Finally, we introduce QAPaC, Question-Answer Pairs As Context. In this approach, we segment documents into QA pairs, and then retrieve context from the collection of QA pairs. We show that this approach results in a 1.8 F1 point improvement compared to segmenting documents into either sentences or windows on IIRC. Additionally, we show that using an ensemble of QA-pairs and sentences results in a further improvement of 1.3 F1 points.