Speaker identification using online, frame dependent, and diffusive variance adaptation
Abstract
In this paper we perform maximum likelihood adaptation of the variances of a Gaussian mixture model (GMM) based on a single acoustic data frame. We show that, in the case of prototype (and frame) dependent scaling of the variances, the adaptation amounts to a simple non-linear warping of the exponent of the Gaussian. We also introduce algorithms to perform "diffusive" variance adaptation, in which a positive constant is added to the model variance. When the constant is prototype independent (but possibly frame and coordinate dimension dependent), this modification of the GMM is equivalent to evolution of it by the diffusion equation of physics, which is guaranteed to increase entropy. Applied to the task of text-independent speaker identification on the LLHDB database, we report relative improvements of up to 28% reduction in speaker identification error rate compared to the unadapted model.