Transfer Learning Using L2 Speech to Improve Automatic Speech Recognition of Dysarthric Speech

Steinmetz, Hillel Aryeh

Transfer Learning Using L2 Speech to Improve Automatic Speech Recognition of Dysarthric Speech

dc.contributor.advisor	Levow, Gina-Anne
dc.contributor.author	Steinmetz, Hillel Aryeh
dc.date.accessioned	2023-08-14T17:06:00Z
dc.date.issued	2023-08-14
dc.date.submitted	2023
dc.description	Thesis (Master's)--University of Washington, 2023
dc.description.abstract	Dysarthria is a class of speech disorders associated with impairments to a person’s motor system. Dysarthric speech is diverse but is broadly characterized by reduced prosodic, phonation, and articulatory precision (Rowe et al., 2022). Non-native English speech, or L2 English speech, shares acoustic and phonetic features with the speech of several dysarthria subtypes, such as slower and more variable speech rate compared to native, non-dysarthric English speech (Baese-Berk and Bradlow, 2021; Hertrich et al., 2021). L2 English speech also has different phonetic correlates than native-English speech, with phonetic variation more closely resembling a speaker’s first language (Flege, 1981). Since L2 speech both shares acoustic features with dysarthric speech and has more diverse phonetic correlates of phonological segments, it should facilitate knowledge transfer when training an ASR model on dysarthric recognition tasks. This study finetunes Wav2vec2 models on two English dysarthric speech datasets, UA-Speech and TORGO, and one English L2 speech dataset, L2-Arctic, using standard finetuning and multitask learning paradigms. It examines whether including L2 speech in the training data improves dysarthric speech recognition in speaker-dependent, speaker-independent, and zero-shot settings. Our results suggest that including L2 speech in the training data improves dysarthric speech recognition in speaker-dependent and speaker-independent settings, with models trained using multitask learning performing better than those trained using standard finetuning.
dc.embargo.lift	2024-08-13T17:06:00Z
dc.embargo.terms	Restrict to UW for 1 year -- then make Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	Steinmetz_washington_0250O_25460.pdf
dc.identifier.uri	http://hdl.handle.net/1773/50477
dc.language.iso	en_US
dc.rights	CC BY
dc.subject	ASR
dc.subject	dysarthria
dc.subject	dysarthric speech recognition
dc.subject	L2 speech
dc.subject	multitask learning
dc.subject	transfer learning
dc.subject	Linguistics
dc.subject.other	Linguistics
dc.title	Transfer Learning Using L2 Speech to Improve Automatic Speech Recognition of Dysarthric Speech
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Steinmetz_washington_0250O_25460.pdf
Size:: 595.7 KB
Format:: Adobe Portable Document Format

Download

Collections

Linguistics