Text-Supervised Local Feature Mixup Towards Long-Tailed Image Categorization
| dc.contributor.advisor | Hu, Juhua | |
| dc.contributor.author | Franklin, Richard Samuel | |
| dc.date.accessioned | 2023-09-27T17:17:09Z | |
| dc.date.issued | 2023-09-27 | |
| dc.date.submitted | 2023 | |
| dc.description | Thesis (Master's)--University of Washington, 2023 | |
| dc.description.abstract | In many real-world applications, the frequency distribution of class labels for training deep visual models can exhibit a long-tailed distribution that challenges traditional approaches of training deep neural networks, which require heavy amounts of balanced data. Gathering and labeling data to balance out the class label distribution can be both costly and time-consuming. Many existing solutions that enable ensemble learning, re-balancing strategies, and fine-tuning applied to deep neural networks are limited by the inert problem of few class samples across a subset of classes. Recently, vision-language models like CLIP have been observed as effective solutions to zero-shot or few-shot learning by grasping a similarity between vision and language features for image and text pairs. Considering that large pre-trained vision-language models may contain valuable side textual information for minor classes, in this work, we propose to leverage text supervision to tackle the challenge of long-tailed learning for visual recognition. Furthermore, we propose a novel local feature mixup technique that takes advantage of the semantic relations between classes recognized by the pre-trained text encoder to further help alleviate the long-tailed problem. Our empirical study on several benchmark long-tailed tasks demonstrates the effectiveness of our proposal with a theoretical guarantee. | |
| dc.embargo.lift | 2024-09-26T17:17:09Z | |
| dc.embargo.terms | Restrict to UW for 1 year -- then make Open Access | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.other | Franklin_washington_0250O_26138.pdf | |
| dc.identifier.uri | http://hdl.handle.net/1773/50653 | |
| dc.language.iso | en_US | |
| dc.rights | none | |
| dc.subject | computer vision | |
| dc.subject | deep learning | |
| dc.subject | long-tailed | |
| dc.subject | mixup | |
| dc.subject | multimodal | |
| dc.subject | vision-language | |
| dc.subject | Computer science | |
| dc.subject.other | ||
| dc.title | Text-Supervised Local Feature Mixup Towards Long-Tailed Image Categorization | |
| dc.type | Thesis |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Franklin_washington_0250O_26138.pdf
- Size:
- 3.63 MB
- Format:
- Adobe Portable Document Format
