Wake-up-word spotting using end-to-end deep neural network system

Shilei Zhang; Wen Liu; Yong Qin

doi:10.1109/ICPR.2016.7900073

ICPR 2016

Conference paper

04 Dec 2016

Wake-up-word spotting using end-to-end deep neural network system

View publication

Abstract

Deep neural networks (DNNs) have tremendously improved the performance of automatic speech recognition (ASR). On the other hand, end-to-end speech recognition system can achieve state-of-the-art performance using Long Short-Term Memory (LSTM) recurrent neural networks (RNNs) and Connectionist Temporal Classification (CTC) method for unsegmented sequence data. In this paper, we therefor propose a lightweight wake-up-word (WUW) spotting system based on end-to-end DNN architecture, which is intended to provide a great balance between decoding speed, accuracy and model size. The objective is to introduce CTC framework on spotting process, and to enhance the system by WUW-oriented model training and refinement steps. We test the performance of the proposed architecture on a conversational telephone dataset which illustrate that the computation time can be significantly reduced without a significant decrease in the spotting accuracy.

Conference paper