Robust and reliable large-scale transfer learning

Loading...
Thumbnail Image

Authors

Wortsman, Mitchell

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Machine learning is currently witnessing a convergence towards large, pre-trained models that are fine-tuned for specific applications such as chat. This process, known as large-scale transfer learning, increasingly produces models that are deployed in real-world applications. It is therefore imperative that large-scale transfer is robust and reliable. Our research towards this goal advances fine-tuning robustness and pre-training reliability. Towards robust fine-tuning, we establish weight-interpolation as a technique to combine specialist models into one general model. We use this method to address the tension between robustness and accuracy that can emerge when fine-tuning. Next, we extend this technique to multiple models fine-tuned with diverse hyperparameters to obtain a new state-of-the-art on ImageNet. Towards reliable pre-training, we address a key obstacle that emerges at large scale---training instability. We uncover a predictive relationship between large updates in the network's first layer and loss spikes which slow or destabilize learning. Finally, we establish small-scale proxy models as a reliable tool for studying training divergence, allowing us to predict and mitigate instabilities before they emerge. Our results indicate multiple promising directions for future development, from decentralized training to improvements in model architecture.

Description

Thesis (Ph.D.)--University of Washington, 2023

Citation

DOI