User-Guided Deep Multiple Clustering

Yao, Jiawei

User-Guided Deep Multiple Clustering

Files

Yao_washington_0250E_29148.pdf (7.48 MB)

Date

2026-02-05

relationships.isAuthorOf

Yao, Jiawei

Abstract

Multiple clustering is based on the observation that a dataset can often be partitioned in more than one meaningful way (for example by color or by shape). However, most existing deep methods still optimize a single partition or produce several partitions without making clear which underlying factors they capture, and they often separate representation learning from the clustering objective. This can lead to results that do not match the aspects users care about. This dissertation proposes a user-preference guided framework for deep multiple clustering that aims to obtain partitions that are both diverse and aligned with user interests, and is organized into four contributions that start from data-driven ways of identifying relevant factors and progress to methods that explicitly incorporate user intent and practical system considerations. The first contribution, AugDMC, uses targeted data augmentations as aspect selectors together with a self-supervised, prototype-based objective with stabilization, to learn representations that preserve distinct factors of variation and support multiple interpretable partitions without manual feature engineering. The second contribution, DDMC, introduces dual-level disentanglement tailored to clustering: a variational EM procedure links coarse and fine grained factor discovery (E-step) with a clustering-aware objective (M-step), narrowing the gap between learning “good features” and obtaining “good partitions”. The third contribution, Multi-MaP, aligns frozen CLIP encoders with a user’s high-level concept by introducing learnable textual proxies and constraining them with concept-level and LLM-derived reference-word signals. Building on this, the fourth contribution, Multi-Sub, is a framework for concept conditioned subspace proxies. It first builds a low dimensional subspace that is guided by text, using reference words suggested by an LLM, and then learns a proxy for each image inside this subspace. Representation learning and clustering are optimized together, so the method no longer needs contrastive concepts specified by the user and it also avoids the extra cost of a two stage pipeline. On publicly available visual multiple-clustering benchmarks such as ALOI, Stanford Cars, CMUface, Flowers, Fruit/Fruit360, and Cards, these methods consistently improve NMI and RI and yield partitions that better reflect user intent, with ablation studies validating each design choice. Taken together, the results illustrate how incorporating user preferences, structuring the representation space, and jointly optimizing representations and clusters can make multiple clustering systems better aligned with users’ actual goals in practice.