Effective Model Deployment and Data Curation for Foundation Model Development

Hsieh, Cheng-Yu

Effective Model Deployment and Data Curation for Foundation Model Development

Files

Hsieh_washington_0250E_28690.pdf (24.96 MB)

Date

2025-10-02

relationships.isAuthorOf

Hsieh, Cheng-Yu

Abstract

While scaling—both in terms of model size and dataset volume—has driven many of the recent breakthroughs in AI, this increasingly larger-scale development trajectory faces emergent challenges not seen in traditional small-scale supervised learning setting. On model side, the exponential growth in parameter counts has rendered these highly capable but massive models prohibitively expensive to deploy or adapt for many practical applications. On the data side, although training on massive datasets improves performance on standard benchmarks, scaling alone does not guarantee the emergence of desirable model behaviors beyond traditional metrics. This thesis develops techniques to address core challenges along both the model and data axes of modern AI development. Specifically, it proposes strategies for the efficient deployment and adaptation of Transformer-based large language models, and introduces principled methods for curating reliable and effective data to evaluate and improve modern vision-language models beyond standard accuracy metrics. Collectively, these contributions aim to make large-scale AI systems more effective and accessible across diverse real-world scenarios.