Value function approximation methods for Linearly-solvable Markov Decision Process

Zhong, Mingyuan

Value function approximation methods for Linearly-solvable Markov Decision Process

dc.contributor.advisor	Todorov, Emanuil V	en_US
dc.contributor.author	Zhong, Mingyuan	en_US
dc.date.accessioned	2014-02-24T18:32:03Z
dc.date.available	2014-02-24T18:32:03Z
dc.date.issued	2014-02-24
dc.date.submitted	2013	en_US
dc.description	Thesis (Ph.D.)--University of Washington, 2013	en_US
dc.description.abstract	Optimal control provides an appealing machinery to complete complicated control tasks with limited prior knowledge. Both global methods and online trajectory optimization methods are powerful techniques for solving optimal control problems; however, each has limitations. The global methods are directly or indirectly based on the Bellman equation, which originates from dynamical programming. Finding the solution of Bellman equation, the value function, or cost-to-go function, suffers from multiple difficulties, including the curse of dimensionality. In the linearly-solvable Markov Decision Process (LMDP) framework, the Bellman equation can be linearized despite nonlinearity in the stochastic dynamical models. This fact permits efficient algorithms and motivates specialized function approximation schemes. In the average-cost setting, the Bellman equation in LMDP can be reduced to computing the principal eigenfunction of a linear operator. To solve for the value function of the Bellman equation in this cases, we designed two methods, moving least squares approximation and aggregation methods, to avoid matrix factorization and take advantage of sparsity by using effcient iterative solvers. In the moving least square approximation methods, value function is approximated by linear basis constructed from moving least squares setting. In the aggregation methods, LMDP is approximated by using soft state aggregation over a continuous space. Adaptive schemes for basis placement are developed to provide higher resolution at the regions of state space that are visited most often. Numerical results are provided. Approximating value function is not sucient to apply LMDP in more realistic tasks. We demonstrated that value function methods may require an unrealistic number of base functions to control certain dynamical systems. In order to mitigate the undesirable properties of local and global methods, we explore the possibility of combining value function approximation methods in LMDP with model predictive control (MPC). Exploiting both the value function and the policy generated by solving the LMDP, MPC is able to perform at a level similar to that of MPC alone with long time horizon, but now we may drastically shorten the time horizon of MPC. This also allows LMDP value function approximation methods to be applied to more problems. The results of the implementation of these methods show that global and local methods can and should be combined in real applications to benefit both.	en_US
dc.embargo.terms	No embargo	en_US
dc.format.mimetype	application/pdf	en_US
dc.identifier.other	Zhong_washington_0250E_12670.pdf	en_US
dc.identifier.uri	http://hdl.handle.net/1773/25222
dc.language.iso	en_US	en_US
dc.rights	Copyright is held by the individual authors.	en_US
dc.subject	Acrobot; Bellman equation; Markov Decision Process; Optimal Control; Reinforcement Learning; Value function	en_US
dc.subject.other	Applied mathematics	en_US
dc.subject.other	Computer science	en_US
dc.subject.other	applied mathematics	en_US
dc.title	Value function approximation methods for Linearly-solvable Markov Decision Process	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Zhong_washington_0250E_12670.pdf
Size:: 2.88 MB
Format:: Adobe Portable Document Format

Download

Collections

Applied mathematics