Reducing the footprint of the IBM trainable speech synthesis system

Dan Chazan; Ron Hoory; Zvi Kons; Dorel Silberstein; Alexander Sorin

ICSLP 2002

Conference paper

16 Sep 2002

Reducing the footprint of the IBM trainable speech synthesis system

Abstract

This paper presents a novel approach for concatenative speech synthesis. This approach enables reduction of the dataset size of a concatenative text-to-speech system, namely the IBM trainable speech synthesis system, by more than an order of magnitude. A spectral acoustic feature based speech representation is used for computing a cost function during segment selection as well as for speech generation. Initial results indicate that even with a dataset size of a few megabytes it is possible to achieve quality which is significantly higher than existing small footprint formant based synthesizers.

Conference paper