Value function approximation methods for Linearly-solvable Markov Decision Process

dc.contributor.advisorTodorov, Emanuil Ven_US
dc.contributor.authorZhong, Mingyuanen_US
dc.date.accessioned2014-02-24T18:32:03Z
dc.date.available2014-02-24T18:32:03Z
dc.date.issued2014-02-24
dc.date.submitted2013en_US
dc.descriptionThesis (Ph.D.)--University of Washington, 2013en_US
dc.description.abstractOptimal control provides an appealing machinery to complete complicated control tasks with limited prior knowledge. Both global methods and online trajectory optimization methods are powerful techniques for solving optimal control problems; however, each has limitations. The global methods are directly or indirectly based on the Bellman equation, which originates from dynamical programming. Finding the solution of Bellman equation, the value function, or cost-to-go function, suffers from multiple difficulties, including the curse of dimensionality. In the linearly-solvable Markov Decision Process (LMDP) framework, the Bellman equation can be linearized despite nonlinearity in the stochastic dynamical models. This fact permits efficient algorithms and motivates specialized function approximation schemes. In the average-cost setting, the Bellman equation in LMDP can be reduced to computing the principal eigenfunction of a linear operator. To solve for the value function of the Bellman equation in this cases, we designed two methods, moving least squares approximation and aggregation methods, to avoid matrix factorization and take advantage of sparsity by using effcient iterative solvers. In the moving least square approximation methods, value function is approximated by linear basis constructed from moving least squares setting. In the aggregation methods, LMDP is approximated by using soft state aggregation over a continuous space. Adaptive schemes for basis placement are developed to provide higher resolution at the regions of state space that are visited most often. Numerical results are provided. Approximating value function is not sucient to apply LMDP in more realistic tasks. We demonstrated that value function methods may require an unrealistic number of base functions to control certain dynamical systems. In order to mitigate the undesirable properties of local and global methods, we explore the possibility of combining value function approximation methods in LMDP with model predictive control (MPC). Exploiting both the value function and the policy generated by solving the LMDP, MPC is able to perform at a level similar to that of MPC alone with long time horizon, but now we may drastically shorten the time horizon of MPC. This also allows LMDP value function approximation methods to be applied to more problems. The results of the implementation of these methods show that global and local methods can and should be combined in real applications to benefit both.en_US
dc.embargo.termsNo embargoen_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.otherZhong_washington_0250E_12670.pdfen_US
dc.identifier.urihttp://hdl.handle.net/1773/25222
dc.language.isoen_USen_US
dc.rightsCopyright is held by the individual authors.en_US
dc.subjectAcrobot; Bellman equation; Markov Decision Process; Optimal Control; Reinforcement Learning; Value functionen_US
dc.subject.otherApplied mathematicsen_US
dc.subject.otherComputer scienceen_US
dc.subject.otherapplied mathematicsen_US
dc.titleValue function approximation methods for Linearly-solvable Markov Decision Processen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Zhong_washington_0250E_12670.pdf
Size:
2.88 MB
Format:
Adobe Portable Document Format