On Fine-Tuning Submodular Functions for Data Subset Selection
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
We demonstrate that submodular functions, with fine-tuned hyperparameters, serve as extremely effectivedata subset (i.e., summary) selectors, better than the current state-of-the-art, for training machine learning
systems on data subsets. To search and reduce the hyperparameter space, we introduce meta-summarization
a technique designed to enhance computational efficiency of hyperparameter tuning. Meta-summarization
chooses a subset of summaries based on their inter-summary diversity starting from a large set of generated
summary candidates. This significantly reduces the summaries to train on relative to training on all of
them. This approach enables meta-summarization to find the best performing hyperparameters for a
submodular function faster than other hyperparameter search techniques, significantly reducing computation
and time. We demonstrate that summaries generated using fine-tuned submodular functions outperform
subset selection benchmarks such as DC-Bench (by ≈ 3% absolute) and DeepCore (by ≈ 2% absolute). Fine
tuned submodular functions also outperform random and state-of-the-art k-means based subset selection for
training a popular ViT-based (vision transformer) architecture, DaViT, on ImageNet, thus setting a
new state-of-the-art for supervised subset selection.
Description
Thesis (Master's)--University of Washington, 2024
