MRL-AdANNS: Matryoshka Representation Learning for Web-Scale Adaptive Semantic Search
Date
relationships.isAuthorOf
Rege, Aniket
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Learned representations are essential in modern ML systems, but often struggle to adapt to the re
quired capacity of various downstream tasks. In this thesis, we propose Matryoshka Represen
tation Learning (MRL) [64] to address this challenge, which learns coarse-to-fine representations
with minimal overhead to existing representation learning frameworks at no additional training or
inference cost. MRL achieves accuracy and robustness comparable to low-dimensional represen
tations, with benefits like up to 14× smaller ImageNet-1K embeddings and 14× speed-ups for
large-scale retrieval. It extends seamlessly to web-scale datasets (ImageNet, JFT) across Vision
(ResNet, ViT), Language (BERT), and V+L (ALIGN) modalities. In modern web-scale search
systems, rigid high-dimensional representations are learned via a deep encoder and hooked into an
approximate nearest neighbor search (ANNS) pipeline to retrieve similar data points. Using these
rigid representations is computationally expensive and inflexible to compute-constrained environ
ments. To overcome this, we introduce the novel AdANNS framework [92] to leverage the flexi
bility of Matryoshka Representations at each stage of the ANNS pipeline and provide compute
aware elastic search. We demonstrate state-of-the-art accuracy-compute trade-offs using novel
AdANNS-based key ANNS building blocks like search data structures (AdANNS-IVF) [102]
and quantization (AdANNS-OPQ) [29]. For example on ImageNet retrieval, AdANNS-IVF is
up to 1.5% more accurate than the rigid representations-based IVF [102] at the same compute
budget; and matches accuracy while being up to 90× faster in wall-clock time. For Natural
Questions, 32-byte AdANNS-OPQ matches the accuracy of the 64-byte OPQ baseline [29] con
structed using rigid representations– same accuracy at half the cost! We further show that the
gains from AdANNS translate to modern-day composite ANNS indices that combine search struc
tures and quantization. Finally, we demonstrate that AdANNS can enable inference-time adap
tivity for compute-aware search on ANNS indices built non-adaptively on matryoshka represen
tations. The code is open-sourced at https://github.com/RAIVNLab/MRL and https:
//github.com/RAIVNLab/AdANNS.
Description
Thesis (Master's)--University of Washington, 2023
