Prosody in Human Communication and Machine Understanding

Wright, Richard AOstendorf, MariNg, Sara Blalock2024-10-162024-10-162024-10-162024Ng_washington_0250E_27568.pdfhttps://hdl.handle.net/1773/52547Thesis (Ph.D.)--University of Washington, 2024Speech technology is a ubiquitous part of the modern world, from the voice-enabled assistants in smartphones to bespoke tools used by language researchers. Technological advances and the curation of large speech datasets have enabled these systems to identify words with remarkable quality. However, the black-box nature of large commercial speech understanding systems brings into question the extent to which they can take advantage of cues from prosody. Prosody has great potential as an untapped source of linguistic information for speech understanding that is not surfaced in other aspects of language. Previous work has shown that prosodic information can be exploited computationally to resolve ambiguity for linguistic structures in computational models, and to perform tasks which are considered prosodically significant, such as sarcasm detection. However, computational systems do not benefit from the same social and conversational context that humans have in processing this kind of communication, making such tasks more challenging and further motivating the careful study of prosodic input. This work investigates the hypothesis that explicit encoding of acoustic-prosodic features is a benefit to speech understanding technology. From the domain of punctuation prediction in automatic speech recognition, I show that adding acoustic-prosodic measures can improve the performance of punctuation prediction models for speech transcripts compared to a system that uses only the word sequence. I provide a potential use case for prosodic modeling in the domain of speech entrainment. Finally, I show how computational methods can be used to understand human behavior in prosodically marked speech within the domains of speech timing and regions of presumed hyper-articulation. This work bridges the gap between linguistic questions about prosody, and computational questions about the use of or need for linguistically-motivated acoustic features. Understanding how prosody influences the quality of speech understanding systems is vital in enhancing their utility across various domains and for diverse speakers. The synthesis of the these research strands provides a bird's eye view of the methodologies and challenges that can be involved in computational processing of prosody.application/pdfen-USCC BY-NC-NDentrainmenthyperarticulationprosodypunctuationspeech recognitionstanceLinguisticsArtificial intelligenceCommunicationLinguisticsProsody in Human Communication and Machine UnderstandingThesis