Automatic curation of sports highlights using multimodal excitement features
Abstract
The production of sports highlight packages summarizing a game's most exciting moments is an essential task for broadcast media. Yet, it requires labor-intensive video editing. We propose a novel approach for auto-curating sports highlights, and demonstrate it to create a first of a kind, real-world system for the editorial aid of golf and tennis highlight reels. Our method fuses information from the players' reactions (action recognition such as high-fives and fist pumps), players' expressions (aggressive, tense, smiling, and neutral), spectators (crowd cheering), commentator (tone of the voice and word analysis), and game analytics to determine the most interesting moments of a game. We accurately identify the start and end frames of key shot highlights with additional metadata, such as the player's name and the whole number, or analysts input allowing personalized content summarization and retrieval. In addition, we introduce new techniques for learning our classifiers with reduced manual training data annotation by exploiting the correlation of different modalities. Our work has been demonstrated at a major golf tournament (2017 Masters) and two major international tennis tournaments (2017 Wimbledon and U.S. Open), successfully extracting highlights through the course of the sporting events. For the 2017 Masters, 54% of the clips selected by our system overlapped with the official highlights reels. Furthermore, user studies showed that 90% of the non-overlapping ones were of the same quality of the official clips for the 2017 Masters, while the automatic selection of clips for highlights of 2017 Wimbledon and 2017 US Open agreed with human preferences 80% and 84.2% of the time, respectively.