A. Skumanich
SPIE OE/LASE 1992
Over the past decade or so, several advances have been made to the design of modern large vocabulary continuous speech recognition (LVCSR) systems to the point where their application has broadened from early speaker dependent dictation systems to speaker-independent automatic broadcast news transcription and indexing, lectures and meetings transcription, conversational telephone speech transcription, open-domain voice search, medical and legal speech recognition, and call center applications, to name a few. The commercial success of these systems is an impressive testimony to how far research in LVCSR has come, and the aim of this article is to describe some of the technological underpinnings of modern systems. It must be said, however, that, despite the commercial success and widespread adoption, the problem of large-vocabulary speech recognition is far from being solved: background noise, channel distortions, foreign accents, casual and disfluent speech, or unexpected topic change can cause automated systems to make egregious recognition errors. This is because current LVCSR systems are not robust to mismatched training and test conditions and cannot handle context as well as human listeners despite being trained on thousands of hours of speech and billions of words of text. © 2012 IEEE.
A. Skumanich
SPIE OE/LASE 1992
Michael E. Henderson
International Journal of Bifurcation and Chaos in Applied Sciences and Engineering
Hannaneh Hajishirzi, Julia Hockenmaier, et al.
UAI 2011
John R. Kender, Rick Kjeldsen
IEEE Transactions on Pattern Analysis and Machine Intelligence