Curriculum Learning: from Human Strategies to Learning Dynamics
MetadataShow full item record
The recent success of machine learning (ML), or "the third wave of artificial intelligence (AI)", is built upon computational methods from the fields of optimization and statistics, the availability of large-scale training data and computational power, and partial imitation of human cognitive functions such as convolutional networks. However, current ML techniques can be critically inefficient and prone to imperfect data in practical applications, e.g., when the data are noisy, unlabeled, imbalanced, or contain redundancy, biases, covariate shift, etc. On the other hand, human learning is more strategic and adaptive in planning and selecting training content for different learning stages. Comparing to ML techniques that repeat training on random mini-batches of the same data over all stages, human learning exhibits great advantages in efficiency and robustness when addressing those practical challenges. Therefore, how to develop a strategic ``curriculum'' for ML becomes an important challenge for bridging the gap between human intelligence and machine intelligence. Curriculum learning has been first introduced as a data selection method applied to different learning stages based on human-learning strategies, e.g., selecting easier samples at first and gradually adding more and harder ones later. However, the properties of training materials that humans utilize to design a curriculum are not limited to hardness but can also cover diversity, consistency, representativeness, incentives, impact or utility to future training, etc. In ML, it is challenging to develop efficient and accurate score functions measuring these properties and their contributions to the final/later learning goal. Moreover, given the score functions, it is still an open challenge for a curriculum strategy to plan multiple training stages and adjust the selection criterion adaptive to each stage. Another primary challenge in curriculum learning is the deficiency of principle and theoretically motivated formulations for the joint optimization of model parameters and the curriculum. Without such formulations, it is difficult to relate selection criteria and score functions to the potential objectives of curriculum learning, e.g., training progress, generalization performance, etc. So it is hard to explain when and why a curriculum can improve ML. Moreover, when developing curriculum learning algorithms, the planning and scheduling of selection criteria for different learning stages need to be designed specifically for different ML applications, e.g., semi-supervised learning, ensemble learning, etc. In order to achieve a practically effective algorithm, it is also important to study whether and how to incorporate existing techniques developed for the specific application with the curriculum. This thesis aims at addressing the key challenges above. It consists of four parts. In Part I, we introduce several novel formulations for curriculum learning. For example, we can translate human learning strategies to discrete-continuous optimizations and jointly optimize the model and the curriculum over the course of training, as shown in Chapter 2 and Chapter 5. We can also derive an analytic form of the weights or scores from a novel objective for curriculum learning, as shown in Chapter 3 and Chapter 4. Moreover, we discuss several potential formulations in Chapter 6 for future research. In Part II, we take a deep dive into the score function design that plays a significant role in curriculum learning. For example, the diversity of selected data plays a vital role in reducing redundancy and encouraging early-stage exploration. Besides diversity, we mainly focus on a new class of score functions in Chapter 8, which is based on the training dynamics of a sample over the whole history instead of its instantaneous feedback at a specific step. Compared to the widely-applied instantaneous scores, they significantly reduce the extra computation required by score evaluations and they are more accurate in allocating the most informative training samples due to their distinguishable dynamic patterns. In Part III, we build practical curriculum learning algorithms based on the developed formulations and score functions. These algorithms cover several important machine learning problems including supervised learning, semi-supervised learning, noisy-label learning, ensemble learning, etc. In the algorithm for each problem, we study and compare different planning or scheduling strategies that determine how the selection criterion change across learning stages. We justify the effectiveness of the proposed scheduling strategies by detailed empirical analyses and comparisons. In addition, to achieve state-of-the-art performance on each problem, we investigate the interactions between the curriculum and the existing techniques for each problem and then combine their strengths in the algorithmic designs. In Part IV, on each application problem’s benchmark datasets, we evaluate our methods and conduct an extensive experimental comparison with a variety of strong baselines. Our methods equipped with the designed curricula consistently bring improvement in both the training efficiency and the final test accuracy in all applications. It is worth noting that the curricula show more significant advantages on more challenging applications with imperfect data such as semi-supervised learning and noisy-label learning. In Chapter 18, we summarize the main contributions of this thesis. In addition to the proposed formulations, score functions, and algorithms for curriculum learning, we also highlight our efforts in bridging the gaps and combining the strengths of human heuristics, theoretical formulations, and empirical algorithms in a line of work. In addition, we list several potential research directions to explore in future work, which can significantly expand the current schemes and application fields of curriculum learning and improve our in-depth understanding of the training dynamics in machine learning as well as its connections to human education and cognition.
Showing items related by title, author, creator and subject.
Ainsworth, Samuel KennethSequential decision making, especially in the face of uncertainty, is a central challenge in our quest to build increasingly safe, capable, and (seemingly-)intelligent autonomous systems. Whereas supervised learning is ...
Thompson, KathleenThis study examines the key characteristics of successful fisher learning exchanges (FLEs). FLEs are peer-to-peer gatherings in which fishery stakeholders from different communities freely exchange information and experiences ...
Thulasidasan, SunilMachine learning using deep neural networks -- also called ``Deep Learning'' -- has been at the center of practically every major advance in artificial intelligence over the last several years, revolutionizing the fields ...