About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICHI 2020
Conference paper
Personalized Assessment of Arousal and Valence from Videos
Abstract
Human behavior is influenced by numerous subjective factors such as the environment, culture, hormones, genes etc. This makes the development of a one-size-fits-All behavioral model for emotion recognition challenging, especially in the domain of affect recognition. In this paper we present a method to classify and assess arousal and valence from video in a personalized way. We represent the inherent information in the video independently through three semantically different types of signals, namely motion, appearance and physiology. We use a single-and multi-stream LSTM model for data fusion and classification, and compare our results against published values on a publicly available dataset consisting of 40 subjects. We further demonstrate that the personalized approach reaches better performance (Arousal: 78.16% avg. acc.; Valence 89.22% avg. acc.), while providing more insight into the role of each signal group. For arousal classification we can distinguish between subjects that show dominance of motion-related expressions against others that exhibit more static expressions. Fusion of all three signal types gave an advantage on very few subjects, a challenge that might be related to the video recordings being too short.