Explorations In Curriculum Learning Methods For Training Language Models
Date
relationships.isAuthorOf
Campos, Daniel
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Understanding language depending on the context of its usage has always been one of thecore goals of natural language processing. Recently, contextual word representations created
by language models like ELMo, BERT, ELECTRA, and RoBERTA have provided robust
representations of natural language which serve as the language understanding component
for a diverse range of downstream tasks like information retrieval, question answering, and
information extraction. Curriculum learning is a method that employs a structured training regime instead of the traditional random sampling. Research areas like computer vision
and machine translation have used curriculum learning methods in model training to improve model training speed and model performance. While language models have proven
transformational for the natural language processing community, these models have proven
expensive, energy-intensive, and challenging to train, which has inspired researchers to explore new training methods. In this thesis, we explore the effect of curriculum learning in the
training of language models. Using wikitext-2 and wikitext-103 textual datasets and evaluating word representation transfer learning on the GLUE Benchmark, we find that curriculum
learning methods produce models that outperform their traditionally trained counterparts
when the training corpus is small, but as the training corpora scale, curriculum methods
become less effective than traditional stochastic sampling.
Description
Thesis (Master's)--University of Washington, 2020
