Transient modeling for overlap-add sinusoidal model of speech
Slava Shechtman
ICASSP 2013
High quality low footprint Concatenative Text-To-Speech (CTTS) synthesizers provide a persistent challenge in the field of speech processing. The spectral parameters representing the short speech segments used in the concatenation process constitute a large portion of the required memory. In this paper we propose to use a vectorial form of Polynomial Temporal Decomposition combined with jointly optimal segmentation and polynomial order selection in order to reduce the storage required for the spectral amplitude parameters by 50%, while preserving the perceptual quality of the obtained synthesized speech. ©2010 IEEE.
Slava Shechtman
ICASSP 2013
Tamar Shoham, David Malah, et al.
IEEE Transactions on Audio, Speech and Language Processing
Zhi-Wei Shuang, Raimo Bakis, et al.
ICSLP 2006
Dan Chazan, Ron Hoory, et al.
INTERSPEECH - Eurospeech 2005