Distributionally Robust Optimization for Reinforcement Learning

Song, Jun

Distributionally Robust Optimization for Reinforcement Learning

dc.contributor.advisor	Zhao, Chaoyue
dc.contributor.author	Song, Jun
dc.date.accessioned	2024-04-26T23:21:19Z
dc.date.available	2024-04-26T23:21:19Z
dc.date.issued	2024-04-26
dc.date.submitted	2024
dc.description	Thesis (Ph.D.)--University of Washington, 2024
dc.description.abstract	Reinforcement learning (RL) has received remarkable success in many domains, including video games, board games, robotics and continuous control tasks. Despite the success and attention that RL has received during the past decades, it struggles with several issues that degrade its performance and lead to suboptimality. In model-based RL, the uncertainty in environment dynamics can significantly deteriorate the learnt agent’s ability to recommend good actions. While in model-free RL, learnt agent's performance can be greatly affected by the restrictive parametric assumption on policy distribution. In this dissertation, our goal is to utilize distributionally robust optimization (DRO) to overcome the above-mentioned limitations of RL, and to develop novel and practical RL algorithms with improved robustness and performance. To achieve the goal, we follow two main objectives. The first objective is to adopt DRO to add robustness to the uncertainty in the environment dynamics of the model-based RL. We propose a new Distributionally Robust Markov Decision Process (DRMDP) framework where the distribution of environment dynamics does not have predetermined parametric values, and we consider the worst-case probability distribution of these transition probabilities within a decision-dependent ambiguity set. The second objective is to utilize optimistic DRO to develop nonparametric policy optimization methods for the model-free RL. Since the policy learnt is not confined to the scope of parametric functions, this opens up the possibility of converging to a better optimality. Following this objective, we propose three different nonparametric policy optimization frameworks, with Kullback–Leibler, Wasserstein and Sinkhorn constraints respectively to control the size of policy update. For each framework, we derive the closed-form policy update solution to the corresponding optimistic DRO problem using Lagrangian duality, and propose practical RL algorithms to perform the policy updates. We further improve the sample efficiency of the proposed nonparametric policy optimization frameworks, by incorporating human guidance through imitation learning techniques.
dc.embargo.terms	Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	Song_washington_0250E_26551.pdf
dc.identifier.uri	http://hdl.handle.net/1773/51371
dc.language.iso	en_US
dc.rights	CC BY-NC-ND
dc.subject	Distributionally Robust Optimization
dc.subject	Markov Decision Process
dc.subject	Reinforcement Learning
dc.subject	Wasserstein Metric
dc.subject	Industrial engineering
dc.subject	Computer science
dc.subject	Mathematics
dc.subject.other	Industrial engineering
dc.title	Distributionally Robust Optimization for Reinforcement Learning
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Song_washington_0250E_26551.pdf
Size:: 3.97 MB
Format:: Adobe Portable Document Format

Download

Collections

Industrial engineering