Exploration and Primal-dual Methods in Bandits and Reinforcement Learning

dc.contributor.advisorFazel, Maryam
dc.contributor.authorXiong, Zhihan
dc.date.accessioned2025-10-02T16:07:17Z
dc.date.available2025-10-02T16:07:17Z
dc.date.issued2025-10-02
dc.date.submitted2025
dc.descriptionThesis (Ph.D.)--University of Washington, 2025
dc.description.abstractSequential decision-making, which encompasses both bandit problems and reinforcement learning, forms the foundation of intelligent systems across diverse applications, from adaptive recommendation systems to autonomous robotics. This thesis addresses two fundamental challenges in building reliable, sample-efficient agents that operate robustly in dynamic, complex environments: efficient exploration in non-stationary or structurally complex settings, and the design of appropriate objective functions when multiple approximation layers are inevitable. Regarding the efficient exploration, we develop the first robust pure exploration algorithm for both stationary and non-stationary linear bandits, achieving strong performance in benign settings while maintaining robustness to environmental changes. For single-step congestion games, we exploit the structure of this special class of games to develop the first algorithms for Nash equilibrium learning under various feedback models. For tabular reinforcement learning, we propose the first near-optimal randomized exploration algorithm that nearly matches the fundamental lower bound. Regarding the objective design, we analyze learning objectives through the lens of duality between value learning and policy learning. In an online selective sampling problem for linear bandits, we characterize an optimal ellipsoid-based selection rule through primal-dual analysis. For approximate policy optimization, we propose using dual Bregman divergence instead of the common Euclidean norm to measure similarity in dual space, resulting in the first policy optimization framework with both fast theoretical convergence and superior practical performance. Collectively, these contributions advance the theoretical frontier of exploration and objective design, close several open complexity gaps, and provide practical algorithms validated on robotic control benchmarks. They offer a principled route towards agents that learn robustly and act reliably in dynamic, high-dimensional environments.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherXiong_washington_0250E_28750.pdf
dc.identifier.urihttps://hdl.handle.net/1773/53960
dc.language.isoen_US
dc.rightsCC BY
dc.subjectBandit Problems
dc.subjectMachine Learning
dc.subjectOptimization
dc.subjectReinforcement Learning
dc.subjectComputer science
dc.subjectArtificial intelligence
dc.subject.otherComputer science and engineering
dc.titleExploration and Primal-dual Methods in Bandits and Reinforcement Learning
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Xiong_washington_0250E_28750.pdf
Size:
3.92 MB
Format:
Adobe Portable Document Format