DRAG: Diversity in Retrieval Augmented Generation through the Application of Submodular Functions
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This thesis applies a submodular approach to the reranking stage of Retrieval Augmented Generation to balance the relevance and diversity of the retrieved documents. After initial retrieval using Contriever, we experiment with submodular functions and a baseline of Maximal Marginal Relevance (MMR), a standard function for balancing relevance and diversity. We apply convex combinations of three approaches: 1) a submodular feature based function using LOG1P concavity and Facility Location, 2) One-Hot Quantization (a quantized modular function with a one-hot feature based function) with manual weights and Facility Location, and 3) One-Hot Quantization with exponential weight decay and Facility Location. We perform hyperparameter selection for the submodular functions and for MMR. We evaluate these on five datasets designed for diversity-focused tasks (news, politics, analogies, etc). We show submodular functions outperform or match MMR's performance in nearly all cases, with recall improvements exceeding 20% (relative difference) in the best case scenario. These results suggest a submodular approach can be effective to improve RAG systems, particularly in diversity-sensitive tasks.
Description
Thesis (Master's)--University of Washington, 2025
