A.E. Brennemann, R.L. Hollis
IROS 1995
In recent years, automatic recognition of spoken languages has become an important feature in a variety of speech-enabled multilingual applications which, besides accuracy, also demand for efficient and "linguistically scalable" algorithms. This paper deals with a particularly successful approach based on phonotactic-acoustic features and presents systems for language identification as well as for unknown-language rejection. An architecture with multipath decoding, improved phonotactic models using binary-tree structures, and acoustic pronunciation models serve as a framework for experiments and discussion on these two tasks. In particular, language identification accuracy on a telephone-speech task (NIST'95 evaluation) in six and nine languages is presented together with results from a perceptual experiment carried out with human listeners. The performance of language rejection based on phonotactic modeling combined with a monolingual LVCSR system in the domain of broadcast news transcription is also reported. Besides yielding state-of-the-art performance, the described systems are computationally inexpensive and easily extensible (scalable) to new languages without the need for linguistic experts.
A.E. Brennemann, R.L. Hollis
IROS 1995
Arnon Amir, M. Lindenbaum
Computer Vision and Image Understanding
Yining Hong, Haoyu Zhen, et al.
NeurIPS 2023
Diganta Misra, Muawiz Chaudhary, et al.
CVPRW 2024