Submodular data selection in ASR language modeling

dc.contributor.advisorKirchhoff, Katrin
dc.contributor.authorAly, Ahmed
dc.date.accessioned2017-02-14T22:40:49Z
dc.date.available2017-02-14T22:40:49Z
dc.date.issued2017-02-14
dc.date.submitted2016-12
dc.descriptionThesis (Master's)--University of Washington, 2016-12
dc.description.abstractGiven the vast amount of textual data that we have available today, it is very beneficial to have an efficient methodology to filter and select important and relevant chunks of this data to improve current natural language and speech processing systems. Although utilizing very large language models has been the industry norm in the current automatic speech recognition production systems, the focus is now shifting towards efficient ways to generate and utilize personalized and adapted language models as they have proven to improve the end user experience. Submodular methods have achieved great success in different domains; acoustic modeling, text summarization, and machine translation. They provide a natural way to select high-quality relevant data from an out-of-domain data source to be utilized in domain adaptation and personalization. In this work, we model the problem of language modeling data selection as submodular function optimization. Our results show that indeed by using the submodular data selection methods we were able to train better language models with less data. We were also able to reduce the end-to-end word error rate of the ASR system 7% by selecting data from a completely different domain.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherAly_washington_0250O_16645.pdf
dc.identifier.urihttp://hdl.handle.net/1773/38167
dc.language.isoen_US
dc.rightsnone
dc.subjectdata selection
dc.subjectdomain adaptation
dc.subjectlanguage modeling
dc.subjectspeech recognition
dc.subjectsubmodular functions
dc.subject.otherComputer science
dc.subject.otherLinguistics
dc.subject.otherlinguistics
dc.titleSubmodular data selection in ASR language modeling
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Aly_washington_0250O_16645.pdf
Size:
287 KB
Format:
Adobe Portable Document Format

Collections