New Distance measures for text-independent speaker identification
Abstract
Distance measures [1][2][3] based on the covariance matrix of feature vectors were applied to text-independent speaker verification and identification. However, some of them do not satisfy the symmetric property which is fundamental to a distance measure. In this paper, we propose several symmetric distance measures based on the covariance matrix of feature vectors, and then construct some advanced measures using the data fusion method [4]. These new distance measures have good mathematic properties and impose little overhead in calculation. We apply these distance measures to text-independent speaker identification and handset detection. A new robust technique is developed for crosshandset speaker identification, and find that compensating the second order statistics is important when dealing with the mismatch caused by different handsets. The experiment uses the cb1 and cb2 data in the LLHDB corpus [5] for same-handset and cross-handset speaker identification test. We find that the use of delta cepstra decreases the speaker identification error rate by as much as 38%. Data fusion technique could further decrease the error rate by 11%. Applying these distance measures to 2-handset detection problem, the error rate is 12%. Using our new robust technique, the cross-handset speaker identification error rate is could be decreased by around 17%.