A Comparative Analysis of Transcription Errors from Major Commercial Automatic Speech Recognition Systems on Speakers of Four Ethnic Backgrounds in the Pacific Northwest
Loading...
Date
Authors
Scott, Michael Kelly
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Major commercial ASR systems have demonstrated higher transcription error rates for non-white American English speakers, particularly for African American speakers, and there is evidence that sociophonetic features are highly associated with these errors (Koenecke et al., 2020; Wassink et al., 2022). In this thesis, I analyze the transcription results of four major commercial ASR systems—Apple Speech, Amazon Transcribe, Google Speech-to-text, and IBM Watson Speech-to-text—on recordings from the Pacific Northwest English (PNWE) corpus originally collected for Wassink (2015), and I attempt to answer two research questions: 1. Do sociophonetic markers typical of African American Language (AAL) correlate with higher inaccuracy rates in major commercial ASR systems for African American speakers than for speakers of different ethnic backgrounds? 2. Are there any phonological features representative of AAL which appear more frequently on incorrectly transcribed speech for African American speakers than for other co-regional speakers? To do this, I ran automatic transcription on recordings of 16 speakers from four ethnic backgrounds— African American, Caucasian American, ChicanX, and Yakama—for all four ASR systems evaluated. I identified ten target linguistic variables which represent common sociophonetic markers of African American Language (AAL) and identified co-occurrences of these markers with transcription errors for each ASR system in order to perform both a quantitative and heuristically informed qualitative analysis. From this, I determined that the resistance to the low-back merger and the pre-nasal front merger (pen-pin merger) are both most strongly associated with errors for the African American speakers than for any other ethnic group, and that consonant cluster reduction is more strongly associated with errors for the Yakama and Caucasian American speakers than for the African American speakers.
Description
Thesis (Master's)--University of Washington, 2023
