About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
INFORMS 2022
Invited talk
Deep Policy Iteration with Integer Programming for Inventory Management
Abstract
Reinforcement learning has led to considerable break-throughs in diverse areas such as robotics, games and many others, but its application in complex real-world decision making problems remains limited. Many problems in OM are characterized by large action spaces and stochastic system dynamics, providing a challenge for existing RL methods that rely on enumeration techniques to solve per step action problems. To resolve these issues, we develop Programmable Actor Reinforcement Learning (PARL), a policy iteration method that uses techniques from integer programming and sample average approximation. We demonstrate its effectiveness on a variety of multi-echelon inventory management settings.