Histogram Matching to Reduce Acoustic Mismatch in Automatic Speech Recognition

Fey, Cuinn Rios

Histogram Matching to Reduce Acoustic Mismatch in Automatic Speech Recognition

Files

Fey_washington_0250O_22351.pdf (2.95 MB)

Date

2021-03-19

Authors

Fey, Cuinn Rios

Abstract

With motivation from histogram matching in image processing used to redistribute pixel probabilities in each color channel of an image, a new approach with an old technique is used for reducing acoustic mismatch between audio signals. Mel-frequency-dependent histogram matching with a silence threshold used in the log Mel-spectrogram domain is implemented before the decoding step in an automatic speech recognition system. The technique is shown to be effective within a system built to recognize low-resource, noisy, compressed, and distorted air traffic control communications. The algorithm has been shown to be robust to high acoustic variance and capable of reducing acoustic mismatch between training, validation, and test data. Additionally, it can decrease the word error rate with a statistically significant chance of confidence improvement. After tuning the algorithm’s silence threshold on the validation dataset, we were able to lower the word error rate when decoding on the test dataset from 50.4% to 46.8% with a 99.9% chance of confidence improvement.