Publication
IWSLT 2006
Conference paper

IBM Arabic-to-English Translation for IWSLT 2006

Abstract

We present techniques for improving domain-specific translation quality with a relatively high OOV ratio on test data sets. The key idea is to maximize the vocabulary coverage without degrading the translation quality. We maximize vocabulary coverage by segmenting a word into a sequence of morphemes, prefix*-stem-suffix* and by adding a large amount of out-of-domain training corpora. To preserve the domain-specific meaning of vocabularies occurring in both domain-specific and out-of-domain training corpora, we assign a higher weight to the domain-specific corpus than to the out-of-domain corpora. IBM Arabic-to-English spoken language translation systems using these techniques have demonstrated the best performances in the Open Data Track of the IWSLT2006 Evaluation Campaign.

Date

Publication

IWSLT 2006

Authors

Topics

Share