Cyclegan Bandwidth Extension Acoustic Modeling for Automatic Speech Recognition
Abstract
Although narrowband (NB) and wideband (WB) speech data primarily differ in sampling rate, these two common input sources are difficult to simultaneously model for automatic speech recognition (ASR). Meanwhile, cycle consistent generative adversarial networks (CycleGANs) have been shown value in a number of acoustic tasks such as mapping between domains due to their powerful generators. We apply Cycle-GAN to the task of bandwidth extension (BWE) and test a variety of architectures. The CycleGANs produce encouraging losses and reconstructed spectrograms. In order to further reduce word error rates (WER) we add an additional discriminative loss to the CycleGAN BWE architecture. This more closely matches our ASR goal and we show gains in WER compared to a standard BWE model discriminatively trained only to map from upsampled narrowband (UNB) to WB data.