Estimation and Inference of Optimal Policies

dc.contributor.advisorLuedtke, Alex AL
dc.contributor.advisorJain, Lalit LJ
dc.contributor.authorLi, Zhaoqi
dc.date.accessioned2024-09-09T23:16:38Z
dc.date.available2024-09-09T23:16:38Z
dc.date.issued2024-09-09
dc.date.issued2024-09-09
dc.date.submitted2024
dc.descriptionThesis (Ph.D.)--University of Washington, 2024
dc.description.abstractMany fields conduct experiments to learn policies that map individual characteristics to actions, with those achieving the best outcomes referred to as optimal policies. As getting human feedback from experiments is expensive, we are often interested in learning the optimal policy as quickly as possible. However, there are several challenges in developing practical approaches for policy learning. First, traditional methods usually only guarantee minimax optimality, while practitioners care more about performances for their particular problem instance. Therefore, a better notion of optimality than the worst case is needed. Second, existing optimal methods are generally hard to implement on a large scale, making deployment challenging for large companies. Third, real-world settings often involve multiple performance metrics of interest, such as mitigating side effects while ensuring good disease recovery in biomedical sciences or balancing short-term acquisition with long-term retention in digital marketing. This dissertation tackles these challenges and provides several practical approaches for policy learning from various perspectives. To identify the optimal policy as fast as possible, we frame policy learning as pure exploration problems in bandits and develop algorithms that provably identify the optimal policy quickly for every problem instance, a concept we refer to as instance optimality. We first focus on the stochastic contextual bandit problem in the PAC setting: given a policy class, the goal is to return a policy whose expected reward is near the optimal reward with high probability. We characterize the first instance-dependent PAC sample complexity of contextual bandits. We propose a new computationally efficient algorithm that achieves this sample complexity using only a polynomial number of calls to an argmax oracle. We then delve into the challenge of computational efficiency, focusing on developing algorithms that are easily implementable on a large scale. We focus on the linear bandit setting where we aim to return the arm with the largest reward given a set of arms and an unknown parameter vector. We introduce an algorithm that leverages the same oracles required by the widely-used Thompson sampling algorithm, namely sampling and argmax oracles, and achieves an asymptotically optimal exponential convergence rate. In addition, we demonstrate that our algorithm is easy to implement and performs empirically as well as existing optimal methods. We also explore the impact of the optimal policy on additional metrics when multiple objectives are of interest. We propose a novel margin condition that restricts how the subsidiary metric behaves for nearly optimal policies. Under this condition, we provide an efficient estimator for evaluating subsidiary metrics under a policy that is optimal for the primary one. Additionally, we introduce two alternative two-stage strategies that do not require a margin condition. Both methods first construct a set of candidate policies and then build a confidence interval over this set. We provide numerical simulations to assess the performance of these methods in various scenarios.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherLi_washington_0250E_26972.pdf
dc.identifier.urihttps://hdl.handle.net/1773/52190
dc.language.isoen_US
dc.rightsCC BY
dc.subjectAdaptive Experimental Design
dc.subjectBandits
dc.subjectComputational Efficiency
dc.subjectOptimal Exploration
dc.subjectPolicy Learning
dc.subjectStatistics
dc.subject.otherStatistics
dc.titleEstimation and Inference of Optimal Policies
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Li_washington_0250E_26972.pdf
Size:
3.09 MB
Format:
Adobe Portable Document Format

Collections