Shilei Zhang, Yong Qin
ICASSP 2012
Deep neural networks (DNNs) have tremendously improved the performance of automatic speech recognition (ASR). On the other hand, end-to-end speech recognition system can achieve state-of-the-art performance using Long Short-Term Memory (LSTM) recurrent neural networks (RNNs) and Connectionist Temporal Classification (CTC) method for unsegmented sequence data. In this paper, we therefor propose a lightweight wake-up-word (WUW) spotting system based on end-to-end DNN architecture, which is intended to provide a great balance between decoding speed, accuracy and model size. The objective is to introduce CTC framework on spotting process, and to enhance the system by WUW-oriented model training and refinement steps. We test the performance of the proposed architecture on a conversational telephone dataset which illustrate that the computation time can be significantly reduced without a significant decrease in the spotting accuracy.
Shilei Zhang, Yong Qin
ICASSP 2012
Juan M. Huerta, Cheng Wu, et al.
INTERSPEECH 2009
Wenxiao Cao, Danning Jiang, et al.
ICME 2009
Zhi Qiao, Shiwan Zhao, et al.
IJCAI 2018