The Acoustic Cues at Prosodic Boundaries in Mandarin

Loading...
Thumbnail Image

Authors

Chen, Jiani

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Prosodic boundary labeling is an important task not only for its direct application in speech synthesis but also for constructing speech corpora for speech synthesis. However, due to the lack of quantitive study on Mandarin prosody, it is hard to provide a unified standard for prosodic boundary labeling. In this work, I study the acoustic cues at prosodic boundaries in Mandarin with a large corpus and with quantitative methods. First of all, the study of acoustic cues at prosodic boundaries is done through experimental phonetics methods. Then, the acoustic cues are used as features to study their relation to different boundary types through automatic prosodic boundary labeling and feature ablation experiments. The one-way ANOVA results indicate that the baseline is reset only after intonational phrase boundaries, and it slightly declines after prosodic phrase boundaries. The results of ablation experiments employing a MaxEnt and an SVM classifier, along with the ANOVA test results, demonstrate that silence duration is an essential acoustic cue at the prosodic boundaries. The results of ablation experiments also provide some information on acoustic-related acoustic cues: 1) Long-distance f0 variation (reset/declination) might be useful for measuring the degree of f0 variation after a prosodic boundary, and might be a useful acoustic cue for distinguishing different boundary types. 2) The pitch difference of the prosodic word after the boundary and a prosodic unit before the boundary might be more helpful to distinguish different boundary types. 3) The maximum pitch differences are not as useful as minimum pitch differences for distinguishing different boundary types. The f0 variation (reset/declination) at prosodic word boundaries and prosodic phrase boundaries might be mainly reflected in the variation of the minimum pitch; while at intonational phrase boundaries, the f0 variation might be mainly reflected in the variation of the mean pitch. Furthermore, the results of the one-way ANOVA test and automatic prosodic boundary labeling both indicate that it is most difficult to distinguish prosodic word boundaries and prosodic phrase boundaries. Employing the proposed features in an SVM achieves substantially better results at distinguishing these two types of boundaries.

Description

Thesis (Master's)--University of Washington, 2020

Citation

DOI

Collections