Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
- Andrew Rouditchenko
- Yuan Gong
- et al.
- 2024
- INTERSPEECH 2024
I am a principal scientist and manager at the MIT-IBM Watson AI lab. I am broadly interested in teaching machines to see, listen, and read, with little or no supervision, like humans do. In particular, my research work has been centered on data efficiency (learning more from less), model efficiency, and multimodal perception methods that combine vision, sound/speech, and language.
I am passionate about doing fundamental research as well as developing systems that make a real-world impact. My work has not only been published in top AI conferences, but has also been integrated into multiple products, and covered by media outlets such as the New York Times, ABC News, and CBS 60 minutes. See my bio for more information about me.
More info at http://rogerioferis.com