About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICASSP 2020
Conference paper
Alignment-Length Synchronous Decoding for RNN Transducer
Abstract
We present a beam decoding strategy for recurrent neural network transducers which has the characteristic that all competing hypotheses within the beam have the same alignment length (number of output symbols plus BLANK symbols). We contrast the proposed technique with time-synchronous decoding where the competing hypotheses within the beam correspond to the same input frames (but can have different length output sequences). Experiments on the Switchboard 2000 hours corpus show that alignment-length synchronous decoding (ALSD) is 25% faster than time-synchronous decoding (TSD) for the same accuracy because ALSD performs 42% fewer joint network evaluations and hypothesis expansions during the search. Additionally, we discuss the benefit of caching and batching the prediction and joint network evaluations, of using prefix trees instead of full output vocabulary expansions, and of performing hypothesis recombination after pruning. With open beam decoding, we reach a 6.2% / 10.9% word error rate on the Switchboard and CallHome Hub5 2000 evaluation testsets which compares favorably to other published single-model results on this corpus.