About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICASSP 2020
Conference paper
Fast Training of Deep Neural Networks for Speech Recognition
Abstract
Training large, deep neural network acoustic models for speech recognition on large datasets takes a long time on a single GPU, motivating research on parallel training algorithms. We present an approach for training a bidirectional LSTM acoustic model on the 2000-hour Switchboard corpus. The model we train achieves state-of-the-art word error rate, 7.5% on the Hub5-2000 Switchboard test set and 13.1% on the Callhome test set, and scales to an unprecedented 96 learners while employing only 12 global reductions per epoch of training. As our implementation incurs far fewer reductions than prior work, it does not require aggressively optimized communication primitives to reach state-of-the-art performance in a short amount of time. With 48 NVIDIA V100 GPUs training takes 5 hours; with 96 GPUs, training takes around 3 hours.