About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
IEEE TIP
Paper
Toward efficient action recognition: Principal backpropagation for training two-stream networks
Abstract
In this paper, we propose the novel principal backpropagation networks (PBNets) to revisit the backpropagation algorithms commonly used in training two-stream networks for video action recognition. We content that existing approaches always take all the frames/snippets for the backpropagation not optimal for video recognition since the desired actions only occur in a short period within a video. To remedy these drawbacks, we design a watch-and-choose mechanism. In particular, the watching stage exploits a dense snippet-wise temporal pooling strategy to discover the global characteristic for each input video, while the choosing phase only backpropagates a small number of representative snippets that are selected with two novel strategies, i.e., Max-rule and KL-rule. We prove that with the proposed selection strategies, performing the backpropagation on the selected subset is capable of decreasing the loss of the whole snippets as well. The proposed PBNets are evaluated on two standard video action recognition benchmarks UCF101 and HMDB51, where it surpasses the state of the arts consistently, but requiring less memory and computation to achieve high performance.