About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
AAMAS 2019
Conference paper
Risk averse reinforcement learning for mixed multi-agent environments
Abstract
Most real world applications of multi-agent systems, need to keep a balance between maximizing the rewards and minimizing the risks. In this work we consider a popular risk measure, variance of return (VOR), as a constraint in the agent's policy learning algorithm in the mixed cooperative and competitive environments. We present a multi-timescale actor critic method for risk sensitive Markov games where the risk is modeled as a VOR constraint. We also show that the risk-averse policies satisfy the desired risk constraint without compromising much on the overall reward for a popular task.