Maximum likelihood training of subspaces for inverse covariance modeling
Abstract
Speech recognition systems typically use mixtures of diagonal Gaussians to model the acoustics. Using Gaussians with a more general covariance structure can give improved performance; EMLLT [1] and SPAM [2] models give improvements by restricting the inverse covariance to a linear/affine subspace spanned by rank one and full rank matrices respectively. In this paper we consider training these subspaces to maximize likelihood. For EMLLT ML training the subspace results in significant gains over the scheme proposed in [1]. For SPAM ML training of the subspace slightly improves performance over the method reported in [2]. For the same subspace size an EMLLT model is more efficient computationally than a SPAM model, while the SPAM model is more accurate. This paper proposes a hybrid method of structuring the inverse covariances that both has good accuracy and is computationally efficient.