Segmental modeling for audio segmentation

Hagai Aronowitz

doi:10.1109/ICASSP.2007.366932

ICASSP 2007

Conference paper

06 Aug 2007

Segmental modeling for audio segmentation

View publication

Abstract

Trainable speech/non-speech segmentation and music detection algorithms usually consist of a frame based scoring phase combined with a smoothing phase. This paper suggests a framework in which both phases are explicitly unified in a segment based classifier. We suggest a novel segment based generative model in which audio segments are modeled as supervectors and each class (speech, silence, music) is modeled by a distribution over the supervector space. Segmental speech classes can then be modeled by generative models such as GMMs or can be classified by SVMs. Our suggested framework leads to a significant reduction in error rate. © 2007 IEEE.

Conference paper