The Intelligent Management of Crowd-Powered Machine Learning
MetadataShow full item record
Artificial intelligence and machine learning power many technologies today, from spam filters to self-driving cars to medical decision assistants. While this revolution has hugely benefited from algorithmic developments, it also could not have occurred without data, which nowadays is frequently procured at massive scale from crowds. Because data is so crucial, a key next step towards truly autonomous agents is the design of better methods for intelligently managing now-ubiquitous crowd-powered data-gathering processes. This dissertation takes this key next step by developing algorithms for the online and dynamic control of these processes. We consider how to gather data for its two primary purposes: training and evaluation. In the first part of the dissertation, we develop algorithms for obtaining data for testing. The most important requirement of testing data is that it must be extremely clean. Thus to deal with noisy human annotations, machine learning practitioners typically rely on careful workflow design and advanced statistical techniques for label aggregation. A common process involves designing and testing multiple crowdsourcing workflows for their tasks, identifying the single best-performing workflow, and then aggregating worker responses from redundant runs of that single workflow. We improve upon this process by building two control models: one that allows for switching between many workflows depending on how well a particular workflow is performing for a given example and worker; and one that can aggregate labels from tasks that do not have a finite predefined set of multiple choice answers (e.g., counting tasks). We then implement agents that use our new models to dynamically choose whether to acquire more labels from the crowd or stop, and show that they can produce higher quality labels at a cheaper cost than state-of-the-art baselines. In the second part of the dissertation, we shift to tackle the second purpose of data: training. Because learning algorithms are often robust to noise, training sets do not necessarily have to be clean and have more complex requirements. We first investigate a tradeoff between size and noise. We survey how inductive bias, worker accuracy, and budget affect whether a larger and noisier training set or a smaller and cleaner one will train better classifiers. We then set up a formal framework for dynamically choosing the next example to label or relabel by generalizing active learning to allow for relabeling, which we call re-active learning, and we design new algorithms for re-active learning that outperform active learning baselines. Finally, we leave the noisy setting and investigate how to collect balanced training sets in domains of varying skew, by considering a setting in which workers can not only label examples, but also generate examples with various distributions. We design algorithms that can intelligently switch between deploying these various worker tasks depending on the skew in the dataset, and show that our algorithms can result in significantly better performance than state-of-the-art baselines.