IBM research and Columbia University TRECVID-2013 Multimedia Event Detection (MED), Multimedia Event Recounting (MER), Surveillance event detection (SED), and semantic indexing (SIN) systems
Abstract
For this year’s TRECVID Multimedia Event Detection task [11], our team studied a semantic approach to video retrieval. We constructed a faceted taxonomy of 1313 visual concepts (including attributes and dynamic action concepts) and 85 audio concepts. Event search was performed via keyword search with a human user in-the-loop. Our submitted runs included Pre-Specified and Ad-Hoc event collections. For each collection, we submitted 3 exemplar conditions: 0, 10, and 100 exemplars. For each exemplar condition, we also submitted 3 types of semantic modality retrieval results: visual only, audio only, and combined. The current IBM-Columbia MER system exploits nine observations about human cognition, language, and visual perception in order to produce an effective video recounting of an event. It designed and tuned algorithms that both locate a minimal persuasive video segment, and script a minimal verbal collection of concepts, in order to convince an analyst that the MED decision was correct. With little loss of descriptive clarity. the system achieved the highest speed-up ratio amongst the ten teams competing in the NIST MER evaluation. For SED, we seek to explore temporal dependencies between events for enhancing both evaluation tasks, i.e automatic event detection (retrospective) and interactive event detection with human in the loop (interactive). Our retrospective system is based on a joint-segmentation-detection framework integrated with temporal event modeling while the interactive system performs risk analysis to guide the end user for effective verification. We achieve better results on the retrospective and interactive tasks than last year. For SIN, we submitted 4 full concept detection runs, and 2 concept pair runs. In the first 3 concept detection runs, we changed our data sampling strategy between using balanced bags via majority undersampling for ensemble fusion learning, balanced bags via minority oversampling, and unbalanced bags. For the 4th run we used a rank normalized fusion of the first 3 runs. Concept pair runs consisted of the sum of individual concept classifiers with and without sigmoid normalization of the dataset.