Estimation and Inference of Optimal Policies

Li, Zhaoqi

Estimation and Inference of Optimal Policies

dc.contributor.advisor	Luedtke, Alex AL
dc.contributor.advisor	Jain, Lalit LJ
dc.contributor.author	Li, Zhaoqi
dc.date.accessioned	2024-09-09T23:16:38Z
dc.date.available	2024-09-09T23:16:38Z
dc.date.issued	2024-09-09
dc.date.issued	2024-09-09
dc.date.submitted	2024
dc.description	Thesis (Ph.D.)--University of Washington, 2024
dc.description.abstract	Many fields conduct experiments to learn policies that map individual characteristics to actions, with those achieving the best outcomes referred to as optimal policies. As getting human feedback from experiments is expensive, we are often interested in learning the optimal policy as quickly as possible. However, there are several challenges in developing practical approaches for policy learning. First, traditional methods usually only guarantee minimax optimality, while practitioners care more about performances for their particular problem instance. Therefore, a better notion of optimality than the worst case is needed. Second, existing optimal methods are generally hard to implement on a large scale, making deployment challenging for large companies. Third, real-world settings often involve multiple performance metrics of interest, such as mitigating side effects while ensuring good disease recovery in biomedical sciences or balancing short-term acquisition with long-term retention in digital marketing. This dissertation tackles these challenges and provides several practical approaches for policy learning from various perspectives. To identify the optimal policy as fast as possible, we frame policy learning as pure exploration problems in bandits and develop algorithms that provably identify the optimal policy quickly for every problem instance, a concept we refer to as instance optimality. We first focus on the stochastic contextual bandit problem in the PAC setting: given a policy class, the goal is to return a policy whose expected reward is near the optimal reward with high probability. We characterize the first instance-dependent PAC sample complexity of contextual bandits. We propose a new computationally efficient algorithm that achieves this sample complexity using only a polynomial number of calls to an argmax oracle. We then delve into the challenge of computational efficiency, focusing on developing algorithms that are easily implementable on a large scale. We focus on the linear bandit setting where we aim to return the arm with the largest reward given a set of arms and an unknown parameter vector. We introduce an algorithm that leverages the same oracles required by the widely-used Thompson sampling algorithm, namely sampling and argmax oracles, and achieves an asymptotically optimal exponential convergence rate. In addition, we demonstrate that our algorithm is easy to implement and performs empirically as well as existing optimal methods. We also explore the impact of the optimal policy on additional metrics when multiple objectives are of interest. We propose a novel margin condition that restricts how the subsidiary metric behaves for nearly optimal policies. Under this condition, we provide an efficient estimator for evaluating subsidiary metrics under a policy that is optimal for the primary one. Additionally, we introduce two alternative two-stage strategies that do not require a margin condition. Both methods first construct a set of candidate policies and then build a confidence interval over this set. We provide numerical simulations to assess the performance of these methods in various scenarios.
dc.embargo.terms	Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	Li_washington_0250E_26972.pdf
dc.identifier.uri	https://hdl.handle.net/1773/52190
dc.language.iso	en_US
dc.rights	CC BY
dc.subject	Adaptive Experimental Design
dc.subject	Bandits
dc.subject	Computational Efficiency
dc.subject	Optimal Exploration
dc.subject	Policy Learning
dc.subject	Statistics
dc.subject.other	Statistics
dc.title	Estimation and Inference of Optimal Policies
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Li_washington_0250E_26972.pdf
Size:: 3.09 MB
Format:: Adobe Portable Document Format

Download

Collections

Statistics