Leveraging Prosody for Punctuation Prediction of Spontaneous Speech

relationships.isAuthorOf

Cho, Yeonjin Jenny

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Clarity and precision of written text benefits from correct punctuation. For scripts that lack punctuation, such as conversational speech, there can be errors in accurately interpreting the intention of a speaker based on the words only. There have been efforts in the past to predict punctuation using a variety of language models, but such studies have not taken full advantage of prosody in a neural language model. Several studies have found simple pauses to be a useful method to capture some punctuation marks, but not all punctuation marks are associated with a pause. There are no recent studies that make use of all available prosodic correlates; thus, I explore the benefit of using intonation and energy in addition to the simple pauses. This thesis aims to bridge the gap between recent work and prosody by introducing a new neural model for punctuation prediction that incorporates various prosodic features, such as pauses, duration, pitch and energy of speech. The goal is to improve automatic punctuation prediction in transcriptions of spontaneous speech. In addition, I pose the question of how to represent interruption points---when a speaker breaks the standard grammatical flow of a sentence to repeat or correct a phrase---associated with disfluencies in spontaneous speech. In various experiments on the Switchboard corpus, I find that prosodic information improves punctuation prediction fidelity for both hand transcripts and automatic speech recognition output. The word errors present in the automatic transcriptions hinder the punctuation prediction results at a rate that roughly corresponds to its word error rate. I find that automatically transcribed scripts with word errors benefit more from taking advantage of all prosody features than hand transcripts do. I also find that explicit modeling of interruption points benefits the performance for standard punctuation sets, and that it is better to represent them as commas than no punctuation.

Description

Thesis (Master's)--University of Washington, 2022

Citation

DOI