Feedback Loops in Interactive Machine Learning: Online Weakly-Submodular Learning and Probing for Missing Labels

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Machine learning systems are increasingly deployed as interactive services that obtain data not by sampling from a fixed distribution, but through direct and indirect interaction with an environment, users, and other learners. In recommender engines, language model services, and online platforms, this interaction makes the learner's information environment endogenous: the learner's own actions or current state determine what feedback and data become available to it. This dissertation studies two distinct channels through which interactivity induces endogeneity, and develops principled algorithms with provable guarantees for each. Chapter I addresses history-dependent feedback in repeated interaction. When a learner constructs a set of choices over time (for instance, recommending movies sequentially), the value of each future action depends on what has already been selected: a sequel gains value if the original was recommended, while similar items exhibit diminishing returns. The learner's past actions shape the structure of its own future feedback, creating combinatorial utilities that are neither purely submodular nor purely supermodular. We extend Gaussian Process contextual bandits to objectives that are BP-decomposable (a sum of monotone submodular and supermodular terms) or weakly submodular. We introduce a novel separate-feedback framework where observations are available independently for each component, and integrate Nystrom sketching to ensure scalability. We prove sublinear regret bounds in all cases, demonstrating that richer utility structures can be optimized online with theoretical guarantees. Chapter II addresses choice-driven data allocation in multi-learner markets. When multiple learners compete for the same pool of users, who choose based on predictive quality and inherent preferences (e.g., brand loyalty), the data each learner observes becomes a function of its own performance, creating a second form of endogeneity. We characterize an overspecialization trap: as learners optimize for users who already prefer them, they become less attractive to others, further restricting their data and leading to arbitrarily poor global performance, even when models with low full-population loss exist. Inspired by knowledge distillation, we propose Peer Probing, an algorithm that queries peer models to obtain synthetic labels for users outside the learner's organic base. We prove that this procedure converges almost surely to a stationary point with bounded full-population risk when probing sources are sufficiently informative. Together, these contributions show that accounting for the endogeneity inherent in interactive learning, through richer function classes and richer data sources, yields algorithms that are both theoretically principled and practically effective.

Description

Thesis (Ph.D.)--University of Washington, 2026

Citation

DOI