About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
INFORMS 2023
Talk
Sequential Decision Making (SDM) With Long Term Reward Estimates.
Abstract
We study optimal control problems, wherein actions to be taken at a time step are dependent on a dynamical system’s behavior such as realized demand or process behavior in a plant. Traditional SDM methods may not account for long-term effects of decisions or changing conditions. In our framework, we extend our objective function with two parts - the first part is the rewards within the time horizon for optimal policy, and we also include a modifier to the objective. The modifier term is the expected remaining reward after the recommended policies. Therefore, we can capture the holistic reward of the scenario, allowing us to ensure that the recommended policies benefit not only the period of the recommendation but also that the system is brought into a good system state for better future rewards. We benchmark the performance of our model on a use-case from a processing plant.