Jehanzeb Mirza, Leonid Karlinsky, et al.
NeurIPS 2023
The IBM Research/Columbia team investigated a novel range of low-level and high-level features and their combination for the TRECVID Multimedia Event Detection (MED) task. We submitted four runs exploring various methods of extraction, modeling and fusing of low-level features and hundreds of high-level semantic concepts. Our Run 1 developed event detection models utilizing Support Vector Machines (SVMs) trained from a large number of low-level features and was interesting in establishing the baseline performance for visual features from static video frames. Run 2 trained SVMs from classification scores generated by 780 visual, 113 action and 56 audio high-level semantic classifiers and explored various temporal aggregation techniques. Run 2 was interesting in assessing performance based on different kinds of high-level semantic information. Run 3 fused the lowand high-level feature information and was interesting in providing insight into the complementarity of this information for detecting events. Run 4 fused all of these methods and explored a novel Scene Alignment Model (SAM) algorithm that utilized temporal information discretized by scene changes in the video.
Jehanzeb Mirza, Leonid Karlinsky, et al.
NeurIPS 2023
Hagen Soltau, Lidia Mangu, et al.
ASRU 2011
Diganta Misra, Muawiz Chaudhary, et al.
CVPRW 2024
Hans-Werner Fink, Heinz Schmid, et al.
Journal of the Optical Society of America A: Optics and Image Science, and Vision