A Theory of Active Learning in Dynamic Environments

Wagenmaker, Andrew

A Theory of Active Learning in Dynamic Environments

Files

Wagenmaker_washington_0250E_27092.pdf (4.43 MB)

Date

2024-09-09

Authors

Wagenmaker, Andrew

Abstract

How should an agent interact with an unknown, dynamic environment in order to collect the information necessary to accomplish its goals? This question is central to the operation of any autonomous system, and answering it effectively is a prerequisite to the development of safe, efficient, and flexible algorithmic agents. While existing work has made progress in addressing this question, it is typically either limited to simple settings, or lacks guarantees strong enough to establish what the most effective approaches are. This thesis takes steps towards providing a rigorous theoretical answer to this question. In particular, we focus on the setting of dynamic, long-horizon environments, where the actions the agent takes influence not only their current observation, but also future observations. We are interested in eliciting principles of exploration in such settings: what aspects of the environment should an agent explore, and how can they best explore such aspects? We aim to develop approaches that are active---that provide insights into how an agent can interact with the environment, choosing their actions, to learn as quickly as possible. We first seek to answer these questions in the setting of continuous control. Here we provide novel algorithmic approaches that establish the instance-optimal rate for both system identification---learning the system's parameters---as well as a general formulation of ``decision making''---where the agent wants to learn some decision (e.g. a controller) to minimize some loss. Our results apply to both linear dynamical systems as well as certain classes of nonlinear dynamical systems. To the best of our knowledge, these are the first results to establish the instance-optimal rates for active learning in continuous control. We then turn to the reinforcement learning (RL) setting. Here our focus is on developing instance-dependent bounds---bounds that adapt to the difficulty of any given instances---and using these bounds to understand what approaches to exploration are most effective in reinforcement learning. We establish novel complexity measures for both tabular and linear RL, which we show can be significantly tighter than existing bounds, and show that existing algorithmic principles are insufficient for achieving the optimal rates, and could be arbitrarily suboptimal. Finally, we seek to understand not only how we can optimally collect the information necessary accomplish our goals, but how we can efficiently learn to collect this information. We argue that answering this question is fundamental for achieving effective, practical approaches. In a general function approximation setting, we establish novel measures of complexity which quantify this cost of learning to explore, and algorithmic approaches which we show achieve the optimal rate. Our results yield exponential improvements over existing bounds in many cases, and provide a nearly complete theory of finite-time instance-optimal learning.