Object-based reasoning in VQA
Mikyas T. Desta, Larry Chen, et al.
WACV 2018
We present a novel approach to enhance the challenging task of Visual Question Answering (VQA) by incorporating and enriching semantic knowledge in a VQA model. We first apply Multiple Instance Learning (MIL) to extract a richer visual representation addressing concepts beyond objects such as actions and colors. Motivated by the observation that semantically related answers often appear together in prediction, we further develop a new semantically-guided loss function for model learning which has the potential to drive weakly-scored but correct answers to the top while suppressing wrong answers. We show that these two ideas contribute to performance improvement in a complementary way. We demonstrate competitive results comparable to the state of the art on two VQA benchmark datasets.
Mikyas T. Desta, Larry Chen, et al.
WACV 2018
Gaoyuan Zhang, Songtao Lu, et al.
UAI 2022
Chun Fu Chen, Jinwook Oh, et al.
ISM 2018
Quanfu Fan, Sharath Pankanti
AVSS 2011