Transfer Learning Using L2 Speech to Improve Automatic Speech Recognition of Dysarthric Speech

Loading...
Thumbnail Image

Authors

Steinmetz, Hillel Aryeh

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Dysarthria is a class of speech disorders associated with impairments to a person’s motor system. Dysarthric speech is diverse but is broadly characterized by reduced prosodic, phonation, and articulatory precision (Rowe et al., 2022). Non-native English speech, or L2 English speech, shares acoustic and phonetic features with the speech of several dysarthria subtypes, such as slower and more variable speech rate compared to native, non-dysarthric English speech (Baese-Berk and Bradlow, 2021; Hertrich et al., 2021). L2 English speech also has different phonetic correlates than native-English speech, with phonetic variation more closely resembling a speaker’s first language (Flege, 1981). Since L2 speech both shares acoustic features with dysarthric speech and has more diverse phonetic correlates of phonological segments, it should facilitate knowledge transfer when training an ASR model on dysarthric recognition tasks. This study finetunes Wav2vec2 models on two English dysarthric speech datasets, UA-Speech and TORGO, and one English L2 speech dataset, L2-Arctic, using standard finetuning and multitask learning paradigms. It examines whether including L2 speech in the training data improves dysarthric speech recognition in speaker-dependent, speaker-independent, and zero-shot settings. Our results suggest that including L2 speech in the training data improves dysarthric speech recognition in speaker-dependent and speaker-independent settings, with models trained using multitask learning performing better than those trained using standard finetuning.

Description

Thesis (Master's)--University of Washington, 2023

Citation

DOI

Collections