Publication
ICASSP 2008
Conference paper

Hierarchical linear discounting class n-gram language models: A multilevel class hierarchy approach

View publication

Abstract

We introduce in this paper a hierarchical linear discounting class n-gram language modeling technique that has the advantage of combining several language models, trained at different nodes in a class hierarchy. The approach hierarchically clusters the word vocabulary into a word-tree. The closer a tree node is to the leaves, the more specific the corresponding word class is. The tree is used to balance generalization ability and word specificity when estimating the likelihood of an n-gram event. Experiments are conducted on Wall Street Journal corpus using a vocabulary of 20,000 words. Results show a reduction on the test perplexity over the standard n-gram approaches by 10%. We also report considerable improvement in the accuracy of the speech recognition task. ©2008 IEEE.

Date

Publication

ICASSP 2008

Authors

Share