Exploiting supervector structure for speaker recognition trained on a small development set
Abstract
Nowadays state-of-the-art speaker recognition systems obtain quite satisfactory results for both text-independent and textdependent tasks as long as they are trained on a fair amount of development data from the target domain (assuming clean speech). In this work, we investigate the ability to build accurate speaker recognition systems using small amounts of data from the target domain without using out-of-domain data at all. Our method is based on exploiting the structural nature of GMM supervectors. Knowledge on the way GMM supervectors are created (namely a concatenation of statistics obtained for a set of Gaussians over the feature space) is used to guide modeling in high dimensional supervector space. We report experiments on both text-dependent and textindependent tasks which validate our method and show large error reductions.