About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Conference paper
N-best based stochastic mapping on stereo HMM for noise robust speech recognition
Abstract
In this paper we present an extension of our previously proposed feature space stereo-based stochastic mapping (SSM). As distinct froman auxiliary stereo Gaussian mixture model in the front-end in our previous work, a stereo HMM model in the back-end is used. The basic idea, as in feature space SSM, is to form a joint space of the clean and noisy features, but to train a Gaussian mixture HMM in the new space. The MMSE estimation, which is the conditional expectation of the clean speech given the sequence of noisy observations, leads to clean speech predictors at the granularity of the Gaussian distributions in the HMM model. Because the Gaussians are not known during decoding, N-best hypotheses are employed. This results in a clean speech predictorwhich is a weighted (by posteriors) sum of the estimates from different Gaussian distributions. In experimental evaluation of the proposed method on the Aurora 2 database it gives better performance over the MST model, particularly, about 10%-20% relative improvement under unseen noise conditions. Copyright © 2008 ISCA.