Publication
INTERSPEECH - Eurospeech 2003
Conference paper

Using place name data to train language identification models

Abstract

The language of origin of a name affects its pronunciation, so language identification is an important technology for speech synthesis recognition. Previous work on this task has typically used training sets that are proprietary or limited in coverage. In this work, we investigate the use of a publicallyavailable geographic database for training language ID models. We automatically cluster place names by language, show that models trained from place name data are effective for language ID on person names. In addition, we compare several source-channel direct models for language ID, achieve a 24% reduction in error rate over a source-channel letter trigram model on a 26-way language ID task.

Date

Publication

INTERSPEECH - Eurospeech 2003

Authors

Topics

Share