Shilei Zhang, Yong Qin
ICASSP 2012
This paper presents a robust voice-melody transcription system using a speech recognition framework. While many previous voice-melody transcription systems have utilized non-statistical approaches, statistical recognition technology can potentially achieve more robust results. A cepstrum-based acoustic model is employed to avoid the hard-decisions that have to be made when using explicit voiced-unvoiced segmentation and pitch extraction, and a key-independent 4-gram language model is employed to capture prior probabilities of different melodic sequences. Evaluations are done from the perspective of both note recognition error rate and Query-by-Humming end-to-end performance. The results are compared with three other voice-melody transcription systems. Experiments have shown that our system is state-of-the-art: it is much more robust than other systems on data containing noise, and close to the best of all the systems on the clean data set. © 2007 IEEE.
Shilei Zhang, Yong Qin
ICASSP 2012
Michael Picheny, Zoltan Tuske, et al.
INTERSPEECH 2019
Zhenbo Zhu, Qing Wang, et al.
ICASSP 2007
Vadim Sheinin, Da-Ke He
ICASSP 2007