Amit Anil Nanavati, Nitendra Rajput, et al.
MobileHCI 2011
Bottleneck neural networks have recently found success in a variety of speech recognition tasks. This paper presents an approach in which they are utilized in the front-end of a speaker recognition system. The network inputs are melfrequency cepstral coefficients (MFCCs) from multiple consecutive frames and the outputs are speaker labels. We propose using a recording-level criterion that is optimized via an online learning algorithm. We furthermore propose retraining a network to focus on its errors when leveraging scores from an independently trained system. We ran experiments on the same- and different-microphone tasks of the 2010 NIST Speaker Recognition Evaluation. We found that the proposed bottleneck feature extraction paradigm performs slightly worse than MFCCs but provides complementary information in combination. We also found that the proposed combination strategy with re-training improved the EER by 14% and 18% relative over the baseline MFCC system in the same- and different-microphone tasks respectively.
Amit Anil Nanavati, Nitendra Rajput, et al.
MobileHCI 2011
Amol Thakkar, Andrea Antonia Byekwaso, et al.
ACS Fall 2022
Dimitrios Christofidellis, Giorgio Giannone, et al.
MRS Spring Meeting 2023
Carla F. Griggio, Mayra D. Barrera Machuca, et al.
CSCW 2024