Robust and reliable large-scale transfer learning

Wortsman, Mitchell

Robust and reliable large-scale transfer learning

Files

Wortsman_washington_0250E_26494.pdf (4.39 MB)

Date

2024-04-26

Authors

Wortsman, Mitchell

Abstract

Machine learning is currently witnessing a convergence towards large, pre-trained models that are fine-tuned for specific applications such as chat. This process, known as large-scale transfer learning, increasingly produces models that are deployed in real-world applications. It is therefore imperative that large-scale transfer is robust and reliable. Our research towards this goal advances fine-tuning robustness and pre-training reliability. Towards robust fine-tuning, we establish weight-interpolation as a technique to combine specialist models into one general model. We use this method to address the tension between robustness and accuracy that can emerge when fine-tuning. Next, we extend this technique to multiple models fine-tuned with diverse hyperparameters to obtain a new state-of-the-art on ImageNet. Towards reliable pre-training, we address a key obstacle that emerges at large scale---training instability. We uncover a predictive relationship between large updates in the network's first layer and loss spikes which slow or destabilize learning. Finally, we establish small-scale proxy models as a reliable tool for studying training divergence, allowing us to predict and mitigate instabilities before they emerge. Our results indicate multiple promising directions for future development, from decentralized training to improvements in model architecture.