Concept-based speech-to-speech translation using maximum entropy models for statistical natural concept generation
Abstract
The IBM Multilingual Automatic Speech-To-Speech TranslatOR (MASTOR) system is a research prototype developed for the Defense Advanced Research Projects Agency (DARPA) Babylon/CAST speech-to-speech machine translation program. The system consists of cascaded components of large-vocabulary conversational spontaneous speech recognition, statistical machine translation, and concatenativc text-to-speech synthesis. To achieve highly accurate and robust conversational spoken language translation, a unique concept-based speech-to-speech translation approach is proposed that performs the translation by first understanding the meaning of the automatically recognized text. A decision-tree based statistical natural language understanding algorithm extracts the semantic information from the input sentences, while a natural language generation (NLG) algorithm predicts the translated text via maximum-entropy-based statistical models. One critical component in our statistical NLG approach is natural concept generation (NCG). The goal of NCG is not only to generate the correct set of concepts in the target language, but also to produce them in an appropriate order. To improve maximum-entropy-based concept generation, a set of new approaches is proposed. One approach improves concept sequence generation in the target language via forward-backward modeling, which selects the hypothesis with the highest combined conditional probability based on both the forward and backward generation models. This paradigm allows the exploration of both the left and right context information in the source and target languages during concept generation. Another approach selects bilingual features that enable maximum-entropy-based model training on the preannotatcd parallel corpora. This feature is augmented with word-level information in order to achieve higher NCG accuracy while minimizing the total number of distinct concepts and, hence, greatly reducing the concept annotation and natural language understanding effort. These features are further expanded to multiple sets to enhance model robustness. Finally, a confidence threshold is introduced to alleviate data sparseness problems in our training corpora. Experiments show a dramatic concept generation error rate reduction of more than 40% in our speech translation corpus within limited domains. Significant improvements of both word error rate and BiLingual Evaluation Understudy (BLEU) score are also achieved in our experiments on speech-to-speech translation. © 2006 IEEE.