Histogram Matching to Reduce Acoustic Mismatch in Automatic Speech Recognition
Loading...
Date
Authors
Fey, Cuinn Rios
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
With motivation from histogram matching in image processing used to redistribute pixel probabilities in each color channel of an image, a new approach with an old technique is used for reducing acoustic mismatch between audio signals. Mel-frequency-dependent histogram matching with a silence threshold used in the log Mel-spectrogram domain is implemented before the decoding step in an automatic speech recognition system. The technique is shown to be effective within a system built to recognize low-resource, noisy, compressed, and distorted air traffic control communications. The algorithm has been shown to be robust to high acoustic variance and capable of reducing acoustic mismatch between training, validation, and test data. Additionally, it can decrease the word error rate with a statistically significant chance of confidence improvement. After tuning the algorithm’s silence threshold on the validation dataset, we were able to lower the word error rate when decoding on the test dataset from 50.4% to 46.8% with a 99.9% chance of confidence improvement.
Description
Thesis (Master's)--University of Washington, 2020
