About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
INFORMS 2020
Talk
MDP Graph-based Intermediate Model for DRL Training
Abstract
We consider enterprise optimization problems, which can be modeled or orchestrated by Deep Reinforcement Learning (DRL) algorithms. For more efficient training, we suggest to decompose the original problem and to model each one of the decomposed problems as Markov Decision Process (MDP). For each one of the MDPs, transition probability matrix and costs are estimated from the historical data and domain specifics. Graphs representing these MDPs are then used as a training environment (or, an intermediate model) for the original DRL, which provides rapid training for DRL algorithms.