Variational Bhattacharyya divergence for hidden Markov models
John R. Hershey, Peder A. Olsen
ICASSP 2008
Determining the correct phonemes and pitch accents is important for creating natural Japanese speech. We implemented a TTS front-end system based on an n-gram model. However, the vocabulary of the word n-gram model is limited to the list of the words found in the training corpus, and collecting a very large training corpus is not an easy task. In this paper, we propose using an additional class n-gram model to incorporate not only the words found in the training corpus, but the words found in the dictionary to further improve the accuracy. In our experiments, our proposed model relatively improves the accuracy for estimating accents by 16.9% and the accuracy for estimating phonemes by 21.6% compared to the word n-gram model. ©2008 IEEE.
John R. Hershey, Peder A. Olsen
ICASSP 2008
Guillaume Le Moing, Phongtharin Vinayavekhin, et al.
MMSP 2019
Qing Wang, Da Fan, et al.
ICASSP 2008
Subhajit Chaudhury, Sakyasingha Dasgupta, et al.
MLSP 2017