Leveraging Prosody for Punctuation Prediction of Spontaneous Speech

dc.contributor.advisorOstendorf, Mari
dc.contributor.authorCho, Yeonjin Jenny
dc.date.accessioned2022-07-14T22:04:28Z
dc.date.available2022-07-14T22:04:28Z
dc.date.issued2022-07-14
dc.date.submitted2022
dc.descriptionThesis (Master's)--University of Washington, 2022
dc.description.abstractClarity and precision of written text benefits from correct punctuation. For scripts that lack punctuation, such as conversational speech, there can be errors in accurately interpreting the intention of a speaker based on the words only. There have been efforts in the past to predict punctuation using a variety of language models, but such studies have not taken full advantage of prosody in a neural language model. Several studies have found simple pauses to be a useful method to capture some punctuation marks, but not all punctuation marks are associated with a pause. There are no recent studies that make use of all available prosodic correlates; thus, I explore the benefit of using intonation and energy in addition to the simple pauses. This thesis aims to bridge the gap between recent work and prosody by introducing a new neural model for punctuation prediction that incorporates various prosodic features, such as pauses, duration, pitch and energy of speech. The goal is to improve automatic punctuation prediction in transcriptions of spontaneous speech. In addition, I pose the question of how to represent interruption points---when a speaker breaks the standard grammatical flow of a sentence to repeat or correct a phrase---associated with disfluencies in spontaneous speech. In various experiments on the Switchboard corpus, I find that prosodic information improves punctuation prediction fidelity for both hand transcripts and automatic speech recognition output. The word errors present in the automatic transcriptions hinder the punctuation prediction results at a rate that roughly corresponds to its word error rate. I find that automatically transcribed scripts with word errors benefit more from taking advantage of all prosody features than hand transcripts do. I also find that explicit modeling of interruption points benefits the performance for standard punctuation sets, and that it is better to represent them as commas than no punctuation.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherCho_washington_0250O_24034.pdf
dc.identifier.urihttp://hdl.handle.net/1773/48786
dc.language.isoen_US
dc.rightsnone
dc.subjectCNN
dc.subjectprosody
dc.subjectpunctuation
dc.subjectRNN
dc.subjectSwitchboard
dc.subjecttransformer
dc.subjectElectrical engineering
dc.subjectComputer science
dc.subject.other
dc.titleLeveraging Prosody for Punctuation Prediction of Spontaneous Speech
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Cho_washington_0250O_24034.pdf
Size:
2.12 MB
Format:
Adobe Portable Document Format