An empirical study of confusion modeling in keyword search for low resource languages

Murat Saraclar; Abhinav Sethy; Bhuvana Ramabhadran; Lidia Mangu; Jia Cui; Xiaodong Cui; Brian Kingsbury; Jonathan Mamou

doi:10.1109/ASRU.2013.6707774

ASRU 2013

Conference paper

01 Dec 2013

An empirical study of confusion modeling in keyword search for low resource languages

View publication

Abstract

Keyword search, in the context of low resource languages, has emerged as a key area of research. The dominant approach in keyword search is to use Automatic Speech Recognition (ASR) as a front end to produce a representation of audio that can be indexed. The biggest drawback of this approach lies in its the inability to deal with out-of-vocabulary words and query terms that are not in the ASR system output. In this paper we present an empirical study evaluating various approaches based on using confusion models as query expansion techniques to address this problem. We present results across four languages using a range of confusion models which lead to significant improvements in keyword search performance as measured by the Maximum Term Weighted Value (MTWV) metric. © 2013 IEEE.

Conference paper