Robust and reliable large-scale transfer learning
| dc.contributor.advisor | Farhadi, Ali | |
| dc.contributor.advisor | Schmidt, Ludwig | |
| dc.contributor.author | Wortsman, Mitchell | |
| dc.date.accessioned | 2024-04-26T23:19:27Z | |
| dc.date.available | 2024-04-26T23:19:27Z | |
| dc.date.issued | 2024-04-26 | |
| dc.date.submitted | 2024 | |
| dc.description | Thesis (Ph.D.)--University of Washington, 2023 | |
| dc.description.abstract | Machine learning is currently witnessing a convergence towards large, pre-trained models that are fine-tuned for specific applications such as chat. This process, known as large-scale transfer learning, increasingly produces models that are deployed in real-world applications. It is therefore imperative that large-scale transfer is robust and reliable. Our research towards this goal advances fine-tuning robustness and pre-training reliability. Towards robust fine-tuning, we establish weight-interpolation as a technique to combine specialist models into one general model. We use this method to address the tension between robustness and accuracy that can emerge when fine-tuning. Next, we extend this technique to multiple models fine-tuned with diverse hyperparameters to obtain a new state-of-the-art on ImageNet. Towards reliable pre-training, we address a key obstacle that emerges at large scale---training instability. We uncover a predictive relationship between large updates in the network's first layer and loss spikes which slow or destabilize learning. Finally, we establish small-scale proxy models as a reliable tool for studying training divergence, allowing us to predict and mitigate instabilities before they emerge. Our results indicate multiple promising directions for future development, from decentralized training to improvements in model architecture. | |
| dc.embargo.terms | Open Access | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.other | Wortsman_washington_0250E_26494.pdf | |
| dc.identifier.uri | http://hdl.handle.net/1773/51335 | |
| dc.language.iso | en_US | |
| dc.rights | none | |
| dc.subject | ||
| dc.subject | Computer science | |
| dc.subject.other | Computer science and engineering | |
| dc.title | Robust and reliable large-scale transfer learning | |
| dc.type | Thesis |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Wortsman_washington_0250E_26494.pdf
- Size:
- 4.39 MB
- Format:
- Adobe Portable Document Format
