Building Blocks for Data-Driven Theories of Language Understanding
Loading...
Date
Authors
Michael, Julian
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
I propose a paradigm for scientific progress in natural language processing, centered around the development of data-driven theories of language understanding. The central idea is to collect data in tightly scoped, carefully defined ways which allow for exhaustive annotation of a behavioral phenomenon of interest. With such data, we can use machine learning to construct explanatory theories of these phenomena which can be used as building blocks for intelligible AI systems. After laying some conceptual groundwork for the idea, I describe a series of investigations into the development of data and theory for representations of shallow semantic structure in natural language — in particular, using Question-Answer driven Semantic Role Labeling (QA-SRL), a simple schema for annotating verbal predicate-argument structure using highly constrained question-answer pairs. While this just scratches the surface of the complex language behaviors of interest in AI, I outline principles for data collection and theoretical modeling which can inform future scientific progress.
Description
Thesis (Ph.D.)--University of Washington, 2023
