Robust and reliable large-scale transfer learning

dc.contributor.advisorFarhadi, Ali
dc.contributor.advisorSchmidt, Ludwig
dc.contributor.authorWortsman, Mitchell
dc.date.accessioned2024-04-26T23:19:27Z
dc.date.available2024-04-26T23:19:27Z
dc.date.issued2024-04-26
dc.date.submitted2024
dc.descriptionThesis (Ph.D.)--University of Washington, 2023
dc.description.abstractMachine learning is currently witnessing a convergence towards large, pre-trained models that are fine-tuned for specific applications such as chat. This process, known as large-scale transfer learning, increasingly produces models that are deployed in real-world applications. It is therefore imperative that large-scale transfer is robust and reliable. Our research towards this goal advances fine-tuning robustness and pre-training reliability. Towards robust fine-tuning, we establish weight-interpolation as a technique to combine specialist models into one general model. We use this method to address the tension between robustness and accuracy that can emerge when fine-tuning. Next, we extend this technique to multiple models fine-tuned with diverse hyperparameters to obtain a new state-of-the-art on ImageNet. Towards reliable pre-training, we address a key obstacle that emerges at large scale---training instability. We uncover a predictive relationship between large updates in the network's first layer and loss spikes which slow or destabilize learning. Finally, we establish small-scale proxy models as a reliable tool for studying training divergence, allowing us to predict and mitigate instabilities before they emerge. Our results indicate multiple promising directions for future development, from decentralized training to improvements in model architecture.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherWortsman_washington_0250E_26494.pdf
dc.identifier.urihttp://hdl.handle.net/1773/51335
dc.language.isoen_US
dc.rightsnone
dc.subject
dc.subjectComputer science
dc.subject.otherComputer science and engineering
dc.titleRobust and reliable large-scale transfer learning
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Wortsman_washington_0250E_26494.pdf
Size:
4.39 MB
Format:
Adobe Portable Document Format