Real-time multilingual HMM training robust to channel variations
Abstract
This paper describes our efforts towards real-time telephony multi-lingual Large Vocabulary Continuous Speech Recognition (LVCSR) system. The trilingual (English, French and Spanish) landline cellular hybrid systems is compared to each of our best monolingual systems. The results are very comparable. The degradation is approximately less than 10%. A HMM state quality measurement technique is explored to improve the performances on multilingual acoustic models. A pilot experiment on English/Spanish bilingual system demonstrates very good results. We achieved between 5% to 20% improvement on different test conditions. To further extend to speaker phone applications, we employed different front-end processing techniques, mainly CDCN prior to HDA and MLLT to reduce the error rate on the trilingual system by as many as 30%. These results suggest that trilingual acoustic models can be used for real telephony applications.