ICSLP 2000
Conference paper

Multistage coarticulation model combining articulatory, formant and cepstral features


We describe a multi-stage speech production model containing a linear, phoneme-independent coarticulation filter, followed by a nonlinear component. The latter generates two cepstra which are then additively combined: one corresponding to a relatively smooth background spectrum, and the other representing three formant-like spectral peaks. A neural net is used for both parts, but the second part also utilizes a hard-coded function that generates exactly three spectral peaks. A unified model of training, adaptation, and decoding is developed, each operation differing only with respect to prior probability distributions. Prior probabilities can be introduced at each stage of the model, providing a flexible framework for utilizing both specific and general prior knowledge. We demonstrate the use of this model for speech synthesis as well as recognition.



ICSLP 2000

