Histogram Matching to Reduce Acoustic Mismatch in Automatic Speech Recognition

Loading...
Thumbnail Image

Authors

Fey, Cuinn Rios

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

With motivation from histogram matching in image processing used to redistribute pixel probabilities in each color channel of an image, a new approach with an old technique is used for reducing acoustic mismatch between audio signals. Mel-frequency-dependent histogram matching with a silence threshold used in the log Mel-spectrogram domain is implemented before the decoding step in an automatic speech recognition system. The technique is shown to be effective within a system built to recognize low-resource, noisy, compressed, and distorted air traffic control communications. The algorithm has been shown to be robust to high acoustic variance and capable of reducing acoustic mismatch between training, validation, and test data. Additionally, it can decrease the word error rate with a statistically significant chance of confidence improvement. After tuning the algorithm’s silence threshold on the validation dataset, we were able to lower the word error rate when decoding on the test dataset from 50.4% to 46.8% with a 99.9% chance of confidence improvement.

Description

Thesis (Master's)--University of Washington, 2020

Citation

DOI