About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICASSP 2017
Conference paper
Harmonic feature fusion for robust neural network-based acoustic modeling
Abstract
Acoustic modeling with deep learning has drastically improved the performance of automatic speech recognition (ASR) where the main stream of the acoustic feature is still log-Mel filtered one. While the log-Mel filtered features lose harmonic-structure information, they still include useful information for ASR. Several attempts have been made to integrate higher-resolution information into the network. In order to improve the ASR accuracy in noisy conditions, we propose new features integrated into acoustic modeling to represent which parts in the time-frequency domain have a distinct harmonic structure, since it is partially observed in noisy environments. The new features are combined with the standard acoustic features, and the network is trained with them using various noisy data. Through these operations, it learns the acoustic features with a kind of quality tag describing which parts are clean or degraded. Our model reduced the word error rate in an Aurora-4 task by 10.3% in DNN compared with the strong baseline while retaining the high accuracy in clean test cases.