Text-Supervised Local Feature Mixup Towards Long-Tailed Image Categorization

dc.contributor.advisorHu, Juhua
dc.contributor.authorFranklin, Richard Samuel
dc.date.accessioned2023-09-27T17:17:09Z
dc.date.issued2023-09-27
dc.date.submitted2023
dc.descriptionThesis (Master's)--University of Washington, 2023
dc.description.abstractIn many real-world applications, the frequency distribution of class labels for training deep visual models can exhibit a long-tailed distribution that challenges traditional approaches of training deep neural networks, which require heavy amounts of balanced data. Gathering and labeling data to balance out the class label distribution can be both costly and time-consuming. Many existing solutions that enable ensemble learning, re-balancing strategies, and fine-tuning applied to deep neural networks are limited by the inert problem of few class samples across a subset of classes. Recently, vision-language models like CLIP have been observed as effective solutions to zero-shot or few-shot learning by grasping a similarity between vision and language features for image and text pairs. Considering that large pre-trained vision-language models may contain valuable side textual information for minor classes, in this work, we propose to leverage text supervision to tackle the challenge of long-tailed learning for visual recognition. Furthermore, we propose a novel local feature mixup technique that takes advantage of the semantic relations between classes recognized by the pre-trained text encoder to further help alleviate the long-tailed problem. Our empirical study on several benchmark long-tailed tasks demonstrates the effectiveness of our proposal with a theoretical guarantee.
dc.embargo.lift2024-09-26T17:17:09Z
dc.embargo.termsRestrict to UW for 1 year -- then make Open Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherFranklin_washington_0250O_26138.pdf
dc.identifier.urihttp://hdl.handle.net/1773/50653
dc.language.isoen_US
dc.rightsnone
dc.subjectcomputer vision
dc.subjectdeep learning
dc.subjectlong-tailed
dc.subjectmixup
dc.subjectmultimodal
dc.subjectvision-language
dc.subjectComputer science
dc.subject.other
dc.titleText-Supervised Local Feature Mixup Towards Long-Tailed Image Categorization
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Franklin_washington_0250O_26138.pdf
Size:
3.63 MB
Format:
Adobe Portable Document Format