Approximate Dynamic Programming Solving the Curses of Dimensionality Second Edition Warren B. Powell Princeton University The Department of Operations Research and Financial Engineering This paper is more than a convergence proof for this particular problem class – it lays out a proof technique, which combines our work on concave approximations with theory laid out by Bertsekas and Tsitsiklis (in their Neuro-Dynamic Programming book). Powell, W.B., “Merging AI and OR to Solve High-Dimensional Resource Allocation Problems using Approximate Dynamic Programming” Informs Journal on Computing, Vol. Ma, J. and W. B. Powell, “A convergent recursive least squares policy iteration algorithm for multi-dimensional Markov decision process with continuous state and action spaces,” IEEE Conference on Approximate Dynamic Programming and Reinforcement Learning (part of IEEE Symposium on Computational Intelligence), March, 2009. The problem arises in settings where resources are distributed from a central storage facility. 4.6 The Post-Decision State Variable, 129. This paper also provides a more rigorous treatment of what is known as the “multiperiod travel time” problem, and provides a formal development of a procedure for accelerating convergence. 4, pp. 1: 2014: The system can't perform the operation now. Princeton University. 109-137, November, 2014, This technique worked very well for single commodity problems, but it was not at all obvious that it would work well for multicommodity problems, since there are more substitution opportunities. In fact, there are up to three curses of dimensionality: the state space, the outcome space and the action space. 40, No. a backgammon board). (c) Elsevier. 5 - Modeling - Good problem solving starts with good modeling. These two short chapters provide yet another brief introduction to the modeling and algorithmic framework of ADP. on Power Systems (to appear) Illustrates the process of modeling a stochastic, dynamic system using an energy storage application, and shows that each of the four classes of policies works best on a particular variant of the problem. Chapter with a basic background in probability and statistics, and (for some Dynamic (c) Informs. (click here to download: ADP – I: Modeling), (click here to download: ADP – II: Algorithms). The challenge of dynamic programming: Problem: Curse of dimensionality tt tt t t t t max ( , ) ( )|({11}) x VS C S x EV S S++ ∈ =+ X Three curses State space Outcome space Action space (feasible region) 64 Magazines from CASTLELAB.PRINCETON.EDU found on - Read for FREE 4 Introduction to Approximate Dynamic Programming 111 4.1 The Three Curses of Dimensionality (Revisited), 112 4.2 The Basic Idea, 114 4.3 Q-Learning and SARSA, 122 4.4 Real-Time Dynamic Programming, 126 4.5 Approximate Value Iteration, 127 4.6 The Post-Decision State Variable, 129 4.7 Low-Dimensional Representations of Value Functions, 144 Our model uses adaptive learning to bring forecast information into decisions made now, providing a more realistic estimate of the value of future information. These are shown for both offline and online implementations. 1, pp. An Approximate Dynamic Programming Algorithm for Monotone Value Functions Daniel R. Jiang, Warren B. Powell Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08540 {,} Presentations - A series of presentations on approximate dynamic programming, spanning applications, modeling and algorithms. 4.4 Real-Time Dynamic Programming, 126. on Power Systems (to appear) Summarizes the modeling framework and four classes of policies, contrasting the notational systems and canonical frameworks of different communities. Approximate dynamic programming: solving the curses of dimensionality, published by John Wiley and Sons, is the first book to merge dynamic programming and math programming using the language of approximate dynamic programming. Topaloglu, H. and W.B. The OR community tends to work on problems with many simple entities. 56, No. 9 (2009). In addition, it also assumes that the expected in Bellman’s equation cannot be computed. 34, No. “Clearing the Jungle of Stochastic Optimization.” INFORMS Tutorials in Operations Research: Bridging Data and Decisions, pp. We review the literature on approximate dynamic programming, with the goal of better understanding the theory behind practical algorithms for solving dynamic programs with continuous and vector-valued states and actions, and complex information processes. 3, pp. The project required bringing together years of research in approximate dynamic programming, merging math programming with machine learning, to solve dynamic programs with extremely high-dimensional state variables. 2, pp. 36, No. Approximate Dynamic Programming for High-Dimensional Resource Allocation Problems. You can use textbook backward dynamic programming if there is only one product type, but real problems have multiple products. In this chapter, we consider a base perimeter patrol stochastic control problem. Approximate dynamic programming (ADP) is both a modeling and algorithmic framework for solving stochastic optimiza- tion problems. I have worked for a number of years using piecewise linear function approximations for a broad range of complex resource allocation problems. Approximate Dynamic Programming Much of our work falls in the intersection of stochastic programming and dynamic programming. The dynamic programming literature primarily deals with problems with low dimensional state and action spaces, which allow the use of discrete dynamic programming techniques. 210-237 (2009). For a shorter article, written in the style of reinforcement learning (with an energy setting), please download: Also see the two-part tutorial aimed at the IEEE/controls community: W. B. Powell, Stephan Meisel, "Tutorial on Stochastic Optimization in Energy I: Modeling and Policies", IEEE Trans. 1, No. applications) linear programming. 50, No. 742-769, 2003. Deterministic stepsize formulas can be frustrating since they have parameters that have to be tuned (difficult if you are estimating thousands of values at the same time). programming has often been dismissed because it suffers from "the curse However, the stochastic programming community generally does not exploit state variables, and does not use the concepts and vocabulary of dynamic programming. Cornell ORIE. This paper reviews a number of popular stepsize formulas, provides a classic result for optimal stepsizes with stationary data, and derives a new optimal stepsize formula for nonstationary data. Approximate Dynamic Programming for High-Dimensional Resource Allocation Problems Warren Powell Department of Operations Research and Financial Engineering Princeton University Wednesday, May 2, 2007 4:30 - 5:30 PM Terman Engineering Center, Room 453 Abstract: The experimental comparisons against multistage nested Benders (which is very slow) and more classical rolling horizon procedures suggests that it works very well indeed. Why would we approximate a problem that is easy to solve to optimality? Thus, a decision made at a single state can provide us with information about many states, making each individual observation much more powerful. M.A. Please see each event's listing for details about how to view or participate. 43, No. Approximate dynamic programming has evolved, initially independently, within operations research, computer science and the engineering controls community, all searching for practical tools for solving sequential stochastic optimization problems. Somewhat surprisingly, generic machine learning algorithms for approximating value functions did not work particularly well. Powell, “Dynamic Programming Approximations for Stochastic, Time-Staged Integer Multicommodity Flow Problems,” Informs Journal on Computing, Vol. In Dynamic Programming, Richard E. Bellman introduces his groundbreaking theory and furnishes a new and versatile mathematical tool for the treatment of many complex problems, both within and outside of the discipline. 2079-2111 (2008). It shows how math programming and machine learning can be combined to solve dynamic programs with many thousands of dimensions, using techniques that are easily implemented on a laptop. Most of the literature has focused on the problem of approximating V(s) to overcome the problem of multidimensional state variables. 90-109, 1998. Chapter The value functions produced by the ADP algorithm are shown to accurately estimate the marginal value of drivers by domicile. It provides an easy, high-level overview of ADP, emphasizing the perspective that ADP is much more than an algorithm – it is really an umbrella for a wide range of solution procedures which retain, at their core, the need to approximate the value of being in a state. A few years ago we proved convergence of this algorithmic strategy for two-stage problems (click here for a copy). Ryzhov, I. and W. B. Powell, “Bayesian Active Learning with Basis Functions,” IEEE Workshop on Adaptive Dynamic Programming and Reinforcement Learning, Paris, April, 2011. that scale to real-world applications. Because the optimal policy only works on single link problems with one type of product, while the other is scalable to much harder problems. In this chapter, we consider a base perimeter patrol stochastic control problem. 4 Introduction to Approximate Dynamic Programming 111. So this is my updated estimate. We found that the use of nonlinear approximations was complicated by the presence of multiperiod travel times (a problem that does not arise when we use linear approximations). This paper introduces the use of linear approximations of value functions that are learned adaptively. Papadaki, K. and W.B. CONTENTS Preface xi Acknowledgments xv 1 The challenges of dynamic programming 1 Past studies of this topic have used myopic models where advance information provides a major benefit over no information at all. 32, No. Powell, W. B., “Approximate Dynamic Programming I: Modeling,” Encyclopedia of Operations Research and Management Science, John Wiley and Sons, (to appear). 40-54 (2002). This article appeared in the Informs Computing Society Newsletter. allocating energy over a grid), linked by a scalar storage system, such as a water reservoir. So Just What is Approximate Dynamic Programming? In this latest paper, we have our first convergence proof for a multistage problem. The dynamic programming literature primarily deals with problems with low dimensional state and action spaces, which allow the use of discrete dynamic programming techniques. “Approximate dynamic programming” has been discovered independently by different communities under different names: » Neuro-dynamic programming » Reinforcement learning » Forward dynamic programming » Adaptive dynamic programming » Heuristic dynamic programming » Iterative dynamic programming Approximate dynamic programming (ADP) is a general methodological framework for multistage stochastic optimization problems in transportation, finance, energy, and other domains.