Speech to Text to Semantics: A Sequence-to-Sequence System for Spoken Language Understanding

Dodson, John Ryan

Speech to Text to Semantics: A Sequence-to-Sequence System for Spoken Language Understanding

Files

Dodson_washington_0250O_21292.pdf (751.46 KB)

Date

2020-08-14

relationships.isAuthorOf

Dodson, John Ryan

Abstract

Spoken language understanding entails both the automatic transcription of a speech utterance and the identification of one or more semantic concepts being conveyed by the utterance. Traditionally these systems are domain specific and target industries like travel, entertainment, and home automation. As such, many approaches to spoken language understanding solve the task of filling predefined semantic slots, and cannot generalize to identify arbitrary semantic roles. This thesis addresses the broader question of how to extract predicate-argument frames from a transcribed speech utterance. I describe a sequence-to-sequence system for spoken language understanding through shallow semantic parsing. Built using a modification of the OpenSeq2Seq toolkit, the system is able to perform speech recognition and semantic parsing in a single end-to-end flow. The proposed system is extensible and easy to use, allowing for fast iteration on system parameters and model architectures. The system is evaluated through two experiments. The first experiment performs a speech to text to semantics transformation and uses n-best language model rescoring to generate the best transcription sequence. The second experiment executes the same transformation process, but generates transcriptions through shallow language model fusion. Both experiments evaluate several combinations of speech recognition models and semantic parsers.