Footprint reduction of concatenative text-to-speech synthesizers using polynomial temporal decomposition

Tamar Shoham; David Malah; Slava Shechtman

doi:10.1109/ISCCSP.2010.5463316

ISCCSP 2010

Conference paper

28 Jun 2010

Footprint reduction of concatenative text-to-speech synthesizers using polynomial temporal decomposition

View publication

Abstract

High quality low footprint Concatenative Text-To-Speech (CTTS) synthesizers provide a persistent challenge in the field of speech processing. The spectral parameters representing the short speech segments used in the concatenation process constitute a large portion of the required memory. In this paper we propose to use a vectorial form of Polynomial Temporal Decomposition combined with jointly optimal segmentation and polynomial order selection in order to reduce the storage required for the spectral amplitude parameters by 50%, while preserving the perceptual quality of the obtained synthesized speech. ©2010 IEEE.

Conference paper