Value function approximation methods for Linearly-solvable Markov Decision Process
MetadataShow full item record
Optimal control provides an appealing machinery to complete complicated control tasks with limited prior knowledge. Both global methods and online trajectory optimization methods are powerful techniques for solving optimal control problems; however, each has limitations. The global methods are directly or indirectly based on the Bellman equation, which originates from dynamical programming. Finding the solution of Bellman equation, the value function, or cost-to-go function, suffers from multiple difficulties, including the curse of dimensionality. In the linearly-solvable Markov Decision Process (LMDP) framework, the Bellman equation can be linearized despite nonlinearity in the stochastic dynamical models. This fact permits efficient algorithms and motivates specialized function approximation schemes. In the average-cost setting, the Bellman equation in LMDP can be reduced to computing the principal eigenfunction of a linear operator. To solve for the value function of the Bellman equation in this cases, we designed two methods, moving least squares approximation and aggregation methods, to avoid matrix factorization and take advantage of sparsity by using effcient iterative solvers. In the moving least square approximation methods, value function is approximated by linear basis constructed from moving least squares setting. In the aggregation methods, LMDP is approximated by using soft state aggregation over a continuous space. Adaptive schemes for basis placement are developed to provide higher resolution at the regions of state space that are visited most often. Numerical results are provided. Approximating value function is not sucient to apply LMDP in more realistic tasks. We demonstrated that value function methods may require an unrealistic number of base functions to control certain dynamical systems. In order to mitigate the undesirable properties of local and global methods, we explore the possibility of combining value function approximation methods in LMDP with model predictive control (MPC). Exploiting both the value function and the policy generated by solving the LMDP, MPC is able to perform at a level similar to that of MPC alone with long time horizon, but now we may drastically shorten the time horizon of MPC. This also allows LMDP value function approximation methods to be applied to more problems. The results of the implementation of these methods show that global and local methods can and should be combined in real applications to benefit both.
- Applied mathematics