Exploration and Primal-dual Methods in Bandits and Reinforcement Learning

Xiong, Zhihan

Exploration and Primal-dual Methods in Bandits and Reinforcement Learning

dc.contributor.advisor	Fazel, Maryam
dc.contributor.author	Xiong, Zhihan
dc.date.accessioned	2025-10-02T16:07:17Z
dc.date.available	2025-10-02T16:07:17Z
dc.date.issued	2025-10-02
dc.date.submitted	2025
dc.description	Thesis (Ph.D.)--University of Washington, 2025
dc.description.abstract	Sequential decision-making, which encompasses both bandit problems and reinforcement learning, forms the foundation of intelligent systems across diverse applications, from adaptive recommendation systems to autonomous robotics. This thesis addresses two fundamental challenges in building reliable, sample-efficient agents that operate robustly in dynamic, complex environments: efficient exploration in non-stationary or structurally complex settings, and the design of appropriate objective functions when multiple approximation layers are inevitable. Regarding the efficient exploration, we develop the first robust pure exploration algorithm for both stationary and non-stationary linear bandits, achieving strong performance in benign settings while maintaining robustness to environmental changes. For single-step congestion games, we exploit the structure of this special class of games to develop the first algorithms for Nash equilibrium learning under various feedback models. For tabular reinforcement learning, we propose the first near-optimal randomized exploration algorithm that nearly matches the fundamental lower bound. Regarding the objective design, we analyze learning objectives through the lens of duality between value learning and policy learning. In an online selective sampling problem for linear bandits, we characterize an optimal ellipsoid-based selection rule through primal-dual analysis. For approximate policy optimization, we propose using dual Bregman divergence instead of the common Euclidean norm to measure similarity in dual space, resulting in the first policy optimization framework with both fast theoretical convergence and superior practical performance. Collectively, these contributions advance the theoretical frontier of exploration and objective design, close several open complexity gaps, and provide practical algorithms validated on robotic control benchmarks. They offer a principled route towards agents that learn robustly and act reliably in dynamic, high-dimensional environments.
dc.embargo.terms	Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	Xiong_washington_0250E_28750.pdf
dc.identifier.uri	https://hdl.handle.net/1773/53960
dc.language.iso	en_US
dc.rights	CC BY
dc.subject	Bandit Problems
dc.subject	Machine Learning
dc.subject	Optimization
dc.subject	Reinforcement Learning
dc.subject	Computer science
dc.subject	Artificial intelligence
dc.subject.other	Computer science and engineering
dc.title	Exploration and Primal-dual Methods in Bandits and Reinforcement Learning
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Xiong_washington_0250E_28750.pdf
Size:: 3.92 MB
Format:: Adobe Portable Document Format

Download

Collections

Computer science and engineering