Annealed dropout trained maxout networks for improved LVCSR
Abstract
A significant barrier to progress in automatic speech recognition (ASR) capability is the empirical reality that techniques rarely 'scale' -the yield of many apparently fruitful techniques rapidly diminishes to zero as the training criterion or decoder is strengthened, or the size of the training set is increased. Recently we showed that annealed dropout -a regularization procedure which gradually reduces the percentage of neurons that are randomly zeroed out during DNN training -leads to substantial word error rate reductions in the case of small to moderate training data amounts, and acoustic models trained based on the cross-entropy (CE) criterion [1]. In this paper we show that deep Maxout networks trained using annealed dropout can substantially improve the quality of commercial-grade LVCSR systems even when the acoustic model is trained with sequence-level training criterion, and on large amounts of data.