TRAP language identification system for RATS phase II evaluation
Abstract
Automatic language identification or detection of au- dio data has become an important preprocessing step for speech/speaker recognition and audio data mining. In many surveillance applications, language detection has to be per- formed on highly degraded audio inputs. In this paper, we present our work on language detection in highly degraded ra- dio channel scenarios. We provide a brief description of the Targeted Robust Audio Processing (TRAP) language detection system built for the Phase II Evaluation of the Robust Automatic Transcription of Speech (RATS) program. This system is a combination of 15 systems with different frontends and speech activity decisions. We also analyze the usefulness of multi-layer perceptron (MLP) based non-linear projection of i-vectors be- fore SVM classification. The proposed backend reduces the Equal Error Rate (EER) by 11%-25% relative compared to the baseline PCA-based feature representation for SVM classifica- Tion, on the RATS test data consisting of data from eight high- frequency radio communication channels. Copyright © 2013 ISCA.