Factorial hidden restricted boltzmann machines for noise robust speech recognition
Abstract
We present the Factorial Hidden Restricted Boltzmann Machine (FHRBM) for robust speech recognition. Speech and noise are modeled as independent RBMs, and the interaction between them is explicitly modeled to capture how speech and noise combine to generate observed noisy speech features. In contrast with RBMs, where the bottom layer of random variables is observed, inference in the FHRBM is intractable, scaling exponentially with the number of hidden units. We introduce variational algorithms for efficient approximate inference that scale linearly with the number of hidden units. Compared to traditional factorial models of noisy speech, which are based on GMMs, the FHRBM has the advantage that the representations of both speech and noise are highly distributed, allowing the model to learn a parts-based representation of noisy speech data that can generalize better to previously unseen noise compositions. Preliminary results suggest that the approach is promising. © 2012 IEEE.