Michael Picheny, Zoltan Tuske, et al.
INTERSPEECH 2019
Convolutional neural networks (CNN) are extensions to deep neural networks (DNN) which are used as alternate acoustic models with state-of-the-art performances for speech recognition. In this paper, CNNs are used as acoustic models for speech activity detection (SAD) on data collected over noisy radio communication channels. When these SAD models are tested on audio recorded from radio channels not seen during training, there is severe performance degradation. We attribute this degradation to mismatches between the two dimensional filters learnt in the initial CNN layers and the novel channel data. Using a small amount of supervised data from the novel channels, the filters can be adapted to provide significant improvements in SAD performance. In mismatched acoustic conditions, the adapted models provide significant improvements (about 10-25%) relative to conventional DNN-based SAD systems. These results illustrate that CNNs have a considerable advantage in fast adaptation for acoustic modeling in these settings. © 2014 IEEE.
Michael Picheny, Zoltan Tuske, et al.
INTERSPEECH 2019
George Saon, Tom Sercu, et al.
INTERSPEECH 2016
Po-Sen Huang, Haim Avron, et al.
ICASSP 2014
George Saon
SLT 2014