Publication
SIGIR Forum (ACM Special Interest Group on Information Retrieval)
Conference paper

User-trainable Video Annotation Using Multimodal Cues

Abstract

This paper describes progress towards a general framework for incorporating multimodal cues into a trainable system for automatically annotating user-defined semantic concepts in broadcast video. Models of arbitrary concepts are constructed by building classifiers in a score space defined by a pre-deployed set of multimodal models. Results show annotation for user-defined concepts both in and outside the pre-deployed set is competitive with our best video-only models on the TREC Video 2002 corpus. An interesting side result shows speech-only models give performance comparable to our best video-only models for detecting visual concepts such as "outdoors", "face" and "cityscape".

Date

Publication

SIGIR Forum (ACM Special Interest Group on Information Retrieval)

Authors

Share