Variational loopy belief propagation for multi-talker speech recognition
Abstract
We address single-channel speech separation and recognition by combining loopy belief propagation and variational inference methods. Inference is done in a graphical model consisting of an HMM for each speaker combined with the max interaction model of source combination. We present a new variational inference algorithm that exploits the structure of the max model to compute an arbitrarily tight bound on the probability of the mixed data. The variational parameters are chosen so that the algorithm scales linearly in the size of the language and acoustic models, and quadratically in the number of sources. The algorithm scores 30.7% on the SSC task [1], which is the best published result by a method that scales linearly with speaker model complexity to date. The algorithm achieves average recognition error rates of 27%, 35%, and 51% on small datasets of SSC-derived speech mixtures containing two, three, and four sources, respectively, using a single audio channel. Copyright © 2009 ISCA.