Selective Metric Differential Privacy for Language Models

dc.contributor.advisorDe Cock, Martine
dc.contributor.authorMaratkhan, Anuar
dc.date.accessioned2024-02-12T23:38:12Z
dc.date.available2024-02-12T23:38:12Z
dc.date.issued2024-02-12
dc.date.submitted2023
dc.descriptionThesis (Master's)--University of Washington, 2023
dc.description.abstractRecent advancements in pre-trained language models (LMs) have led to many breakthroughs in Natural Language Processing (NLP). When applied for downstream tasks, such as text classifiers or chatbots, LMs can leak information about the large text corpora they were trained on. In privacy-preserving machine learning, it is common to apply Differential Privacy (DP) mechanisms that mitigate such leakage. The traditional notion of DP, where each record in the data is treated as sensitive, does not translate well to NLP tasks since some token sequences - such as addresses and social security numbers - may be sensitive while others are not. We introduce the new notion of Selective Metric Differential Privacy (SMDP) and a concrete mechanism to realize SMDP. To this end, we draw upon the recently proposed notions of Selective DP, in which records are treated as sensitive or not, and Metric DP, in which the notion of adjacent inputs is relaxed through the use of a metric. Our experiments show that GPT models trained on data privatized with our SMDP approach have higher utility than with Metric DP while preserving the same level of privacy protection.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherMaratkhan_washington_0250O_26461.pdf
dc.identifier.urihttp://hdl.handle.net/1773/51067
dc.language.isoen_US
dc.rightsnone
dc.subject
dc.subjectComputer science
dc.subject.other
dc.titleSelective Metric Differential Privacy for Language Models
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Maratkhan_washington_0250O_26461.pdf
Size:
478.07 KB
Format:
Adobe Portable Document Format