MRL-AdANNS: Matryoshka Representation Learning for Web-Scale Adaptive Semantic Search

Farhadi, AliShapiro, LindaRege, Aniket2023-08-142023-08-142023-08-142023Rege_washington_0250O_25349.pdfhttp://hdl.handle.net/1773/50379Thesis (Master's)--University of Washington, 2023Learned representations are essential in modern ML systems, but often struggle to adapt to the required capacity of various downstream tasks. In this thesis, we propose Matryoshka Representation Learning (MRL) [64] to address this challenge, which learns coarse-to-fine representations with minimal overhead to existing representation learning frameworks at no additional training or inference cost. MRL achieves accuracy and robustness comparable to low-dimensional representations, with benefits like up to 14Ã smaller ImageNet-1K embeddings and 14Ã speed-ups for large-scale retrieval. It extends seamlessly to web-scale datasets (ImageNet, JFT) across Vision (ResNet, ViT), Language (BERT), and V+L (ALIGN) modalities. In modern web-scale search systems, rigid high-dimensional representations are learned via a deep encoder and hooked into an approximate nearest neighbor search (ANNS) pipeline to retrieve similar data points. Using these rigid representations is computationally expensive and inflexible to compute-constrained environments. To overcome this, we introduce the novel AdANNS framework [92] to leverage the flexibility of Matryoshka Representations at each stage of the ANNS pipeline and provide compute-aware elastic search. We demonstrate state-of the-art accuracy-compute trade-offs using novel AdANNS-based key ANNS building blocks like search data structures (AdANNS-IVF) [102] and quantization (AdANNS-OPQ) [29]. For example on ImageNet retrieval, AdANNS-IVF is up to 1.5% more accurate than the rigid representations-based IVF [102] at the same compute budget; and matches accuracy while being up to 90Ã faster in wall-clock time. For Natural Questions, 32-byte AdANNS-OPQ matches the accuracy of the 64-byte OPQ baseline [29] constructed using rigid representations – same accuracy at half the cost! We further show that the gains from AdANNS translate to modern-day composite ANNS indices that combine search structures and quantization. Finally, we demonstrate that AdANNS can enable inference-time adaptivity for compute-aware search on ANNS indices built non-adaptively on matryoshka representations. The code is open-sourced at https://github.com/RAIVNLab/MRL and https: //github.com/RAIVNLab/AdANNS.application/pdfen-USCC BYClassificationComputer VisionDeep LearningRepresentation LearningRetrievalSearchElectrical engineeringComputer scienceComputer engineeringElectrical and computer engineeringMRL-AdANNS: Matryoshka Representation Learning for Web-Scale Adaptive Semantic SearchThesis