Decoding with shrinkage-based language models
Abstract
In this paper, we investigate the use of a class-based exponential language model when directly integrated into speech recognition or machine translation decoders. Recently, a novel class-based language model, Model M, was introduced and was shown to outperform regular n-gram models on moderate amounts of Wall Street Journal data. This model was motivated by the observation that shrinking the sum of the parameter magnitudes in an exponential language model leads to better performance on unseen data. In this paper we directly integrate the shrinkage-based language model into two different state-of-the-art machine translation engines as well as a large-scale dynamic speech recognition decoder. Experiments on standard GALE and NIST development and evaluation sets show considerable and consistent improvement in both machine translation quality and speech recognition word error rate. © 2010 ISCA.