Cascaded multilingual audio-visual learning from videos
Andrew Rouditchenko, Angie Boggust, et al.
INTERSPEECH 2021
A number of recent methods to understand neural networks have focused on quantifying the role of individual features. One such method, NetDissect identifies interpretable features of a model using the Broden dataset of visual semantic labels (colors, materials, textures, objects and scenes). Given the recent rise of a number of action recognition datasets, we propose extending the Broden dataset to include actions to better analyze learned action models. We describe the annotation process and results from interpreting action recognition models on the extended Broden dataset.
Andrew Rouditchenko, Angie Boggust, et al.
INTERSPEECH 2021
Marc Jourdan, Sebastien Blandin, et al.
CVPRW 2019
Saiteja Utpala, Alex Gu, et al.
NAACL 2024
Gosia Lazuka, Andreea Simona Anghel, et al.
SC 2024