Automatic Video Content Summarization Using Geospatial Mosaics of Aerial Imagery
Abstract
It is estimated that less than five percent of videos are currently analyzed to any degree. In addition to petabyte-sized multimedia archives, continuing innovations in optics, imaging sensors, camera arrays, (aerial) platforms, and storage technologies indicates that for the foreseeable future existing and new applications will continue to generate enormous volumes of video imagery. Contextual video summarizations and activity maps offers one innovative direction to tackling this Big Data problem in computer vision. The goal of this work is to develop semi-automatic exploitation algorithms and tools to increase utility, dissemination and usage potential by providing quick dynamic overview geospatial mosaics and motion maps. We present a framework to summarize (multiple) video streams from unmanned aerial vehicles (UAV) or drones which have very different characteristics compared to structured commercial and consumer videos that have been analyzed in the past. Using both metadata geospatial characteristics of the video combined with fast low-level image-based algorithms, the proposed method first generates mini-mosaics that can then be combined into geo-referenced meta-mosaics imagery. These geospatial maps enable rapid assessment of hours long videos with arbitrary spatial coverage from multiple sensors by generating quick look imagery, composed of multiple mini-mosaics, summarizing spatiotemporal dynamics such as coverage, dwell time, activity, etc. The overall summarization pipeline was tested on several DARPA Video and Image Retrieval and Analysis Tool (VIRAT) datasets. We evaluate the effectiveness of the proposed video summarization framework using metrics such as compression and hours of viewing time.