Towards Visual Recognition in the Wild

Cai, Jiarui

Towards Visual Recognition in the Wild

dc.contributor.advisor	Hwang, Jenq-Neng
dc.contributor.author	Cai, Jiarui
dc.date.accessioned	2022-07-14T22:04:27Z
dc.date.issued	2022-07-14
dc.date.submitted	2022
dc.description	Thesis (Ph.D.)--University of Washington, 2022
dc.description.abstract	The predefined artificially-balanced training classes in object recognition have limited capability in modeling real-world scenarios where objects are imbalanced-distributed with unknown classes. In this thesis, we present three research works on Long-Tailed Recognition task for both closed-set and open-set scenarios. For closed-set long-tailed recognition, existing one-stage methods improve the overall performance in a "seesaw'' manner, i.e., either sacrifice the head's accuracy for better tail classification or elevate the head's accuracy even higher but ignore the tail. Other algorithms bypass such trade-off by a multi-stage training process: pre-training on imbalanced set and fine-tuning on balanced set. Though achieving promising performance, not only are they sensitive to the generalizability of the pre-trained model, but also not easily integrated into other computer vision tasks like detection and segmentation, where pre-training of classifier solely is not applicable. In this thesis, we introduce a one-stage long-tailed recognition scheme, ally complementary experts (ACE), where the expert is the most knowledgeable specialist in a sub-set that dominates its training, and is complementary to other experts in the less-seen categories without disturbed by what it has never seen. We design a distribution-adaptive optimizer to adjust the learning pace of each expert to avoid over-fitting. Without special bells and whistles, the vanilla ACE outperforms the current one-stage SOTA method by 3-10% on CIFAR10-LT, CIFAR100-LT, ImageNet-LT and iNaturalist datasets. It is also shown to be the first one to break the "seesaw'' trade-off by improving the accuracy of the majority and minority categories simultaneously in only one stage. For open-set long-tailed recognition, firstly, we propose a distribution-sensitive loss, which weighs more on the tail classes to decrease the intra-class distance in the feature space. Building upon these concentrated feature clusters, a local-density-based metric is introduced, called Localizing Unfamiliarity Near Acquaintance (LUNA), to measure the novelty of a testing sample. LUNA is flexible with different cluster sizes and is reliable on the cluster boundary by considering neighbors of different properties. Moreover, contrary to most of the existing works that alleviate the open-set detection as a simple binary decision, LUNA is a quantitative measurement with interpretable meanings. Our proposed method exceeds the state-of-the-art algorithm by 4-6% in the closed-set recognition accuracy and 4% in F-measure under the open-set on the public benchmark datasets, including our own newly introduced fine-grained OLTR dataset about marine species (MS-LT), which is the first naturally-distributed OLTR dataset revealing the genuine genetic relationships of the classes. LUNA is a step closer to the real-world open-set long-tailed recognition problem, however, we see two deficiencies: technically, it suffers from a trade-off between representation learning and classifier training; in terms of result interpretation, the semantic meaning of the learned features are unexplored. Therefore, we present an improved LUNA framework, called LUNA+. In LUNA+, the feature learning and classifier learning are decoupled with an extra feature projection module. In addition, cluster centers are pre-optimized to be uniformly distribution in latent space to eliminate bias. LUNA+ further improves the OLTR performance, and enables more automated, robust and scalable applications.
dc.embargo.lift	2024-07-03T22:04:27Z
dc.embargo.terms	Restrict to UW for 2 years -- then make Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	Cai_washington_0250E_24354.pdf
dc.identifier.uri	http://hdl.handle.net/1773/48785
dc.language.iso	en_US
dc.rights	CC BY
dc.subject	Long-tailed Object Recognition
dc.subject	Novelty Detection
dc.subject	Open-set Recognition
dc.subject	Visual Recognition
dc.subject	Electrical engineering
dc.subject	Computer science
dc.subject.other
dc.title	Towards Visual Recognition in the Wild
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Cai_washington_0250E_24354.pdf
Size:: 22.25 MB
Format:: Adobe Portable Document Format

Download

Collections

Electrical and computer engineering