Rogerio Feris

Title

Principal RSM and Manager, MIT-IBM Watson AI Lab

Bio

I am a principal scientist and manager at the MIT-IBM Watson AI lab. I am broadly interested in teaching machines to see, listen, and read, with little or no supervision, like humans do. In particular, my research work has been centered on data efficiency (learning more from less), model efficiency, and multimodal perception methods that combine vision, sound/speech, and language.

I am passionate about doing fundamental research as well as developing systems that make a real-world impact. My work has not only been published in top AI conferences, but has also been integrated into multiple products, and covered by media outlets such as the New York Times, ABC News, and CBS 60 minutes. See my bio for more information about me.

More info at http://rogerioferis.com

Publications

Workshop on Memory and Vision
- - Zexue He
  - Jovana Kondic
  - et al.
- 2025
- ICCV 2025
M+: Extending MemoryLLM with Scalable Long-Term Memory
- - Yu Wang
  - Dmitry Krotov
  - et al.
- 2025
- ICML 2025
CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment
- - Edson Araujo
  - Andrew Rouditchenko
  - et al.
- 2025
- CVPR 2025
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts
- - Junmo Kang
  - Leonid Karlinsky
  - et al.
- 2025
- ICLR 2025
ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs
- - Irene Huang
  - Wei Lin
  - et al.
- 2024
- NeurIPS 2024
Trans-LoRA: towards data-free Transferable Parameter Efficient Finetuning
- - Runqian Wang
  - Soumya Ghosh
  - et al.
- 2024
- NeurIPS 2024
Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
- - Andrew Rouditchenko
  - Yuan Gong
  - et al.
- 2024
- INTERSPEECH 2024
Large Scale Generative AI Text Applied to Sports and Music
- - Aaron Baughman
  - Eduardo Morales
  - et al.
- 2024
- KDD 2024
Self-Specialization: Uncovering Latent Expertise within Large Language Models
- - Junmo Kang
  - Hongyin Luo
  - et al.
- 2024
- ACL 2024
Scaling Granite Code Models to 128K Context
- - Matthew Stallone
  - Vaibhav Saxena
  - et al.
- 2024
- arXiv

Projects

Generative AI for Sports and Entertainment
Using large language models to support some of the world’s most prestigious sports and entertainment events

Top collaborators

Rogerio Feris

Title

Bio

Publications

Workshop on Memory and Vision

M+: Extending MemoryLLM with Scalable Long-Term Memory

CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment

Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs

Trans-LoRA: towards data-free Transferable Parameter Efficient Finetuning

Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation

Large Scale Generative AI Text Applied to Sports and Music

Self-Specialization: Uncovering Latent Expertise within Large Language Models

Scaling Granite Code Models to 128K Context

Patents

Pathway Management Using Model Analysis And Forcasting

Determination Of Train Presence And Motion State In Railway Environments

Multi-mode Video Event Indexing

Incorporating Video Meta-data In 3d Models

Image Ranking Based On A Attribute Correlation

Detection Of Static Object On Thoroughfare Crossings

Object Retrieval In Video Data Using Complementary Detectors

Semantic Parsing Of Objects In Video

Semantic Parsing Of Objects In Video

Multi-view Object Detection Using Appearance Model Transfer From Similar Scenes

Projects

Generative AI for Sports and Entertainment

Top collaborators

David D. Cox

Dmitry Krotov

Samuel Thomas

Nirmit Desai