Online speaker diarization using adapted i-vector transforms
Weizhong Zhu, Jason Pelecanos
ICASSP 2016
Automatic speech recognition (ASR) systems rely on large quantities of transcribed acoustic data. The collection of audio data is relatively cheap, whereas the transcription of that data is relatively expensive. Thus there is an interest in the ASR community in active learning, in which only a small subset of highly representative data chosen from a large pool of untranscribed audio need be transcribed in order to approach the performance of the system trained with much larger amounts of transcribed audio. In this paper, we compare two basic approaches to active learning: a supervised approach in which we build a speech recognition system from a small amount of seed data in order to make the selection of a limited amount of additional audio for transcription, and an unsupervised approach in which no intermediate system recognition system built from seed data is necessary. Our best unsupervised approach performs quite close to our supervised approach, with both outperforming a random selection scheme.
Weizhong Zhu, Jason Pelecanos
ICASSP 2016
Asaf Rendel, Raul Fernandez, et al.
ICASSP 2016
Victor Soto, Lidia Mangu, et al.
INTERSPEECH 2014
Andrew Rosenberg, Raul Fernandez, et al.
ICASSP 2018