A factor analysis model of sequences for language recognition
Abstract
Joint factor analysis [1] application to speaker and language recognition advanced the performance of automatic systems in these areas. A special case of the early work in [1], namely the i-vector representation [2], has been applied successfully in many areas including speaker [2], language [3], and speech recognition [4]. This work presents a novel model which represents a long sequence of observations using the factor analysis model of shorter overlapping subsquences. This model takes into consideration the dependency of the adjacent latent vectors. It is shown that this model outperforms the current joint factor analysis approach based on the assumption of independent and identically distributed (iid) observations given one global latent vector. In addition, we replace the language-independent prior model of the latent vector in the i-vector model with a language-dependent prior model and modify the objective function used in the estimation of the factor analysis projection matrix and the prior model to correspond to the cross-entropy objective function estimated based on this new model. We derive also the update equations of the projection matrix and the prior model parameters which maximize the cross-entropy objective function. We evaluate the performance of our approach on the language recognition task of the robust automatic transcription of speech (RATS) project. Our experiments show improvements up to 11% relative using the proposed approach in terms of equal error rate compared to the standard approach of using an i-vector representation [2].