Architectures for Language Processing that Leverage Observable Structure in Language

Zayats, Vicky

Architectures for Language Processing that Leverage Observable Structure in Language

Files

Zayats_washington_0250E_22300.pdf (1.35 MB)

Date

2021-07-07

Authors

Zayats, Vicky

Abstract

Structural components are an inherent part of both written and spoken language. While a lot of research has been put into studying the latent structure of language, such as the organization of phonemes into words using morphology, and words into sentences using syntax, higher-level explicitly marked observable structure has not been extensively used. Some examples of observable structure in written text include rows and columns of tables, web-page structure outlined through HTML tags, or conversational history structure in a multi-party discussion. This thesis aims to address the problem of leveraging observable structures that naturally co-occur with language. Specifically, we explore two types of approaches, multi-modal integration and architecture adaptations, to learn a structure-aware representation of language for three different scenarios, where each has a unique structure very different from the others. In our first scenario, we concentrate on Reddit discussion forums that contain conversation structure as part of the metadata, with the task of predicting the most influential comments in a thread. In our second scenario, we look at spoken language where we use acoustic cues as an observable structure for a disfluency detection task. Finally, in our third scenario, we work with Wikipedia articles that contain both tables and text in a question answering task. We compare explicit modeling of the structure with multimodal methods that use features extracted from the structure. In the popularity prediction task, we use a graph of the tree structure of conversation history in a multi-party discussion as a way to propagate information about each one of the elements to the rest of the graph. This is done by introducing a novel graph-LSTM architecture that summarizes and propagates information about the discussion happening at different branches of the tree. In a disfluency detection task we are dealing with structure observed through acoustic-prosodic cues, which contain information associated with the specific word sequence in addition to signaling disfluencies. To address this problem we propose a novel approach that reduces the irrelevant aspects of a prosody observation in order to incorporate only relevant information about the structure, making the signal complementary to that found in the textual component. In the question answering task, we are looking at the ways to represent a table and enhance the limited amount of unstructured text associated with some of the table entries. This is done by two novel contributions. First, we generalize the BERT transformer architecture to capture table representations, while pretraining new relations using table corpus extracted from Wikipedia. Secondly, we introduce a novel approach for updating table representations based on the text of an article surrounding the table in order to enrich table entries with broader context. Our findings from all of the tasks suggest that having architectural adaptations that explicitly model observable structure can be more powerful than feature-based methods.