DRAG: Diversity in Retrieval Augmented Generation through the Application of Submodular Functions

dc.contributor.advisorBilmes, Jeffrey
dc.contributor.authorCortes-Lemos, Maria Paula
dc.date.accessioned2026-02-05T19:37:28Z
dc.date.available2026-02-05T19:37:28Z
dc.date.issued2026-02-05
dc.date.submitted2025
dc.descriptionThesis (Master's)--University of Washington, 2025
dc.description.abstractThis thesis applies a submodular approach to the reranking stage of Retrieval Augmented Generation to balance the relevance and diversity of the retrieved documents. After initial retrieval using Contriever, we experiment with submodular functions and a baseline of Maximal Marginal Relevance (MMR), a standard function for balancing relevance and diversity. We apply convex combinations of three approaches: 1) a submodular feature based function using LOG1P concavity and Facility Location, 2) One-Hot Quantization (a quantized modular function with a one-hot feature based function) with manual weights and Facility Location, and 3) One-Hot Quantization with exponential weight decay and Facility Location. We perform hyperparameter selection for the submodular functions and for MMR. We evaluate these on five datasets designed for diversity-focused tasks (news, politics, analogies, etc). We show submodular functions outperform or match MMR's performance in nearly all cases, with recall improvements exceeding 20% (relative difference) in the best case scenario. These results suggest a submodular approach can be effective to improve RAG systems, particularly in diversity-sensitive tasks.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherCortesLemos_washington_0250O_28959.pdf
dc.identifier.urihttps://hdl.handle.net/1773/55249
dc.language.isoen_US
dc.rightsnone
dc.subjectoptimization
dc.subjectretrieval augmented generation
dc.subjectsubmodular function
dc.subjectArtificial intelligence
dc.subjectLinguistics
dc.subjectMathematics
dc.subject.otherLinguistics
dc.titleDRAG: Diversity in Retrieval Augmented Generation through the Application of Submodular Functions
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
CortesLemos_washington_0250O_28959.pdf
Size:
426.35 KB
Format:
Adobe Portable Document Format

Collections