A nonmonotone learning rate strategy for SGD training of deep neural networks

Nitish Shirish Keskar; George Saon

doi:10.1109/ICASSP.2015.7178917

ICASSP 2015

Conference paper

04 Aug 2015

A nonmonotone learning rate strategy for SGD training of deep neural networks

View publication

Abstract

The algorithm of choice for cross-entropy training of deep neural network (DNN) acoustic models is mini-batch stochastic gradient descent (SGD). One of the important decisions for this algorithm is the learning rate strategy (also called stepsize selection). We investigate several existing schemes and propose a new learning rate strategy which is inspired by nonmonotone linesearch techniques in nonlinear optimization and the NewBob algorithm. This strategy was found to be relatively insensitive to poorly tuned parameters and resulted in lower word error rates compared to Newbob on two different LVCSR tasks (English broadcast news transcription 50 hours and Switchboard telephone conversations 300 hours). Further, we discuss some justifications for the method by briefly linking it to results in optimization theory.

Conference paper