Poovendran, RadhaJiang, Fengqing2025-01-232025-01-232024Jiang_washington_0250O_27652.pdfhttps://hdl.handle.net/1773/52779Thesis (Master's)--University of Washington, 2024Large language models (LLMs) are increasingly deployed as the backend for various applications, including code completion tools and AI-powered search engines. Unlike traditional LLM usage where users directly query the model, LLM-integrated applications act as middleware, refining user inputs with domain-specific knowledge. This approach enhances the quality and relevance of LLM responses by providing more context and specialized information. While LLM-integrated applications offer numerous benefits, they also introduce new attack surfaces. Understanding, mitigating, and eliminating these emerging vulnerabilities is an active area of research. This work examines a setup where users interact with LLMs through an intermediary LLM-integrated application. We focus on the communication cycle from user queries to application responses, powered by backend LLMs. Within this query-response protocol, we identify potential high-risk vulnerabilities that may originate from malicious application developers or external threat actors capable of controlling database access and manipulating critical data. Successful exploitation of these vulnerabilities could result in users receiving responses tailored to the threat initiator's intent, such as biased product preferences. We evaluate these threats against LLM-integrated applications powered by OpenAI's GPT-3.5 and GPT-4. Our empirical results demonstrate that these threats can effectively bypass OpenAI's restrictions and moderation policies, leading to user responses containing bias, toxic content, privacy leakage, and disinformation risks. To mitigate these threats, we identify and define four key properties that a safe LLM-integrated application must satisfy: integrity, source identification, attack detectability, and utility preservation. Based on these properties, we develop a lightweight, threat-agnostic defense called Sheild that mitigates both insider and outsider threats. Our theoretical and empirical evaluations demonstrate the efficacy of this defense mechanism.application/pdfen-USCC BY-NC-NDapplicationlarge language modelmisuse mitigationsafetyComputer scienceArtificial intelligenceElectrical and computer engineeringIdentifying and Mitigating Vulnerabilities in LLM-Integrated ApplicationsThesis