Robust and efficient video scene detection using optimal sequential grouping
Abstract
Video scene detection is the task of dividing a video into semantic sections. We propose a novel and effective method for temporal grouping of scenes using an arbitrary set of features computed from the video. We formulate the task of video scene detection as a general optimization problem and provide an efficient solution using dynamic programming. Our unique formulation allows us to directly obtain a temporally consistent segmentation, unlike many existing methods, and has the advantage of being parameter-free. We also present a novel technique to estimate the number of scenes in the video using Singular Value Decomposition (SVD) as a low-rank approximation of a distance matrix. We provide detailed experimental results, showing that our algorithm outperforms current state of the art methods. In addition, we created a new Open Video Scene Detection (OVSD) dataset which we make publicly available on the web. The ground truth scene annotation was objectively created based on the movie scripts, and the open nature of the dataset makes it available for both academic and commercial use, unlike existing datasets for video scene detection.