Hagai Aronowitz, Oren Barkan
INTERSPEECH 2011
Trainable speech/non-speech segmentation and music detection algorithms usually consist of a frame based scoring phase combined with a smoothing phase. This paper suggests a framework in which both phases are explicitly unified in a segment based classifier. We suggest a novel segment based generative model in which audio segments are modeled as supervectors and each class (speech, silence, music) is modeled by a distribution over the supervector space. Segmental speech classes can then be modeled by generative models such as GMMs or can be classified by SVMs. Our suggested framework leads to a significant reduction in error rate. © 2007 IEEE.
Hagai Aronowitz, Oren Barkan
INTERSPEECH 2011
Yosef A. Solewicz, Hagai Aronowitz, et al.
Odyssey 2016
Hagai Aronowitz, Weizhong Zhu, et al.
INTERSPEECH 2020
Tara N. Sainath, Dimitri Kanevsky, et al.
ICASSP 2007