Towards Visual Recognition in the Wild
| dc.contributor.advisor | Hwang, Jenq-Neng | |
| dc.contributor.author | Cai, Jiarui | |
| dc.date.accessioned | 2022-07-14T22:04:27Z | |
| dc.date.issued | 2022-07-14 | |
| dc.date.submitted | 2022 | |
| dc.description | Thesis (Ph.D.)--University of Washington, 2022 | |
| dc.description.abstract | The predefined artificially-balanced training classes in object recognition have limited capability in modeling real-world scenarios where objects are imbalanced-distributed with unknown classes. In this thesis, we present three research works on Long-Tailed Recognition task for both closed-set and open-set scenarios. For closed-set long-tailed recognition, existing one-stage methods improve the overall performance in a "seesaw'' manner, i.e., either sacrifice the head's accuracy for better tail classification or elevate the head's accuracy even higher but ignore the tail. Other algorithms bypass such trade-off by a multi-stage training process: pre-training on imbalanced set and fine-tuning on balanced set. Though achieving promising performance, not only are they sensitive to the generalizability of the pre-trained model, but also not easily integrated into other computer vision tasks like detection and segmentation, where pre-training of classifier solely is not applicable. In this thesis, we introduce a one-stage long-tailed recognition scheme, ally complementary experts (ACE), where the expert is the most knowledgeable specialist in a sub-set that dominates its training, and is complementary to other experts in the less-seen categories without disturbed by what it has never seen. We design a distribution-adaptive optimizer to adjust the learning pace of each expert to avoid over-fitting. Without special bells and whistles, the vanilla ACE outperforms the current one-stage SOTA method by 3-10% on CIFAR10-LT, CIFAR100-LT, ImageNet-LT and iNaturalist datasets. It is also shown to be the first one to break the "seesaw'' trade-off by improving the accuracy of the majority and minority categories simultaneously in only one stage. For open-set long-tailed recognition, firstly, we propose a distribution-sensitive loss, which weighs more on the tail classes to decrease the intra-class distance in the feature space. Building upon these concentrated feature clusters, a local-density-based metric is introduced, called Localizing Unfamiliarity Near Acquaintance (LUNA), to measure the novelty of a testing sample. LUNA is flexible with different cluster sizes and is reliable on the cluster boundary by considering neighbors of different properties. Moreover, contrary to most of the existing works that alleviate the open-set detection as a simple binary decision, LUNA is a quantitative measurement with interpretable meanings. Our proposed method exceeds the state-of-the-art algorithm by 4-6% in the closed-set recognition accuracy and 4% in F-measure under the open-set on the public benchmark datasets, including our own newly introduced fine-grained OLTR dataset about marine species (MS-LT), which is the first naturally-distributed OLTR dataset revealing the genuine genetic relationships of the classes. LUNA is a step closer to the real-world open-set long-tailed recognition problem, however, we see two deficiencies: technically, it suffers from a trade-off between representation learning and classifier training; in terms of result interpretation, the semantic meaning of the learned features are unexplored. Therefore, we present an improved LUNA framework, called LUNA+. In LUNA+, the feature learning and classifier learning are decoupled with an extra feature projection module. In addition, cluster centers are pre-optimized to be uniformly distribution in latent space to eliminate bias. LUNA+ further improves the OLTR performance, and enables more automated, robust and scalable applications. | |
| dc.embargo.lift | 2024-07-03T22:04:27Z | |
| dc.embargo.terms | Restrict to UW for 2 years -- then make Open Access | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.other | Cai_washington_0250E_24354.pdf | |
| dc.identifier.uri | http://hdl.handle.net/1773/48785 | |
| dc.language.iso | en_US | |
| dc.rights | CC BY | |
| dc.subject | Long-tailed Object Recognition | |
| dc.subject | Novelty Detection | |
| dc.subject | Open-set Recognition | |
| dc.subject | Visual Recognition | |
| dc.subject | Electrical engineering | |
| dc.subject | Computer science | |
| dc.subject.other | ||
| dc.title | Towards Visual Recognition in the Wild | |
| dc.type | Thesis |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Cai_washington_0250E_24354.pdf
- Size:
- 22.25 MB
- Format:
- Adobe Portable Document Format
