Forensically inspired approaches to automatic speaker recognition
Abstract
This paper presents ongoing research leveraging forensic methods for automatic speaker recognition. Some of the methods forensic scientists employ include identifying speaker distinctive audio segments and comparing these segments using features such as pitch, formant, and other information. Other approaches have also involved performing a phonetic analysis to recognize idiolectal attributes, and an implicit analysis of the demographics of speakers. Inspired by these forensic phonetic approaches, we target three threads of work; hot-spot analysis, speaker style and pronunciation modelling, and demographics analysis. As a result of this work we show that a phonetic analysis conditioned on select speech events (or hot-spots) can outperform a phonetic analysis performed over all speech without conditioning. In the area of pronunciation modelling, one set of results demonstrate significantly improved robustness by exploiting phonetic structure in an automatic speech recognition system. For demographics analysis, we present state-of-the-art results of systems capable of detecting dialect, non-nativeness and native language. © 2011 IEEE.