Measuring speech quality for text-to-speech systems: Development and assessment of a modified mean opinion score (MOS) scale
Abstract
The quality of text-to-speech systems can be effectively assessed only on the basis of reliable and valid listening tests to assess overall system performance. A mean opinion scale (MOS) has been the recommended measure of synthesized speech quality [ITU-T Recommendation P.85, 1994. Telephone transmission quality subjective opinion tests. A method for subjective performance assessment of the quality of speech voice output devices]. We assessed this MOS scale and developed and tested a modified measure of speech quality. This modified measure has new items specific to text-to-speech systems. Our research was motivated by the lack of clear evidence of the conceptual content of as well as the psychometric properties of the MOS scale. We present conceptual arguments and empirical evidence for the reliability and validity of a modified scale. Moreover, we employ state of the art psychometric techniques such as confirmatory factor analysis to provide strong tests of psychometric properties. This modified scale is better suited to appraise synthesis systems since it includes items that are specific to the artifacts found in synthesized speech. We believe that the speech synthesis research communities will find this modified scale a better fit for listening tests to assess synthesized speech. © 2004 Elsevier Ltd. All rights reserved.