About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Conference paper
Improved voice activity detection using static harmonic features
Abstract
Accurate voice activity detection (VAD) is important for robust automatic speech recognition (ASR) systems. We have proposed a statistical-model-based VAD using the long-term temporal information in speech, which shows good robustness against noise in an automobile environment. For further improvement, this paper describes a new method to exploit harmonic structure information with statistical models. In our approach, local peaks considered to be harmonic structures are extracted, without explicit pitch detection and voiced-unvoiced classification. The proposed method including both long-term temporal and static harmonic features led to considerable improvements under low SNR conditions in our VAD testing. In addition, the word error rate was reduced by 29.1% in a test that included a full ASR system. ©2010 IEEE.