Making deep belief networks effective for large vocabulary continuous speech recognition

Tara N. Sainath; Brian Kingsbury; Bhuvana Ramabhadran; Petr Fousek; Petr Novak; Abdel-Rahman Mohamed

doi:10.1109/ASRU.2011.6163900

ASRU 2011

Conference paper

01 Dec 2011

Making deep belief networks effective for large vocabulary continuous speech recognition

View publication

Abstract

To date, there has been limited work in applying Deep Belief Networks (DBNs) for acoustic modeling in LVCSR tasks, with past work using standard speech features. However, a typical LVCSR system makes use of both feature and model-space speaker adaptation and discriminative training. This paper explores the performance of DBNs in a state-of-the-art LVCSR system, showing improvements over Multi-Layer Perceptrons (MLPs) and GMM/HMMs across a variety of features on an English Broadcast News task. In addition, we provide a recipe for data parallelization of DBN training, showing that data parallelization can provide linear speed-up in the number of machines, without impacting WER. © 2011 IEEE.

Conference paper