Selective Metric Differential Privacy for Language Models

Maratkhan, Anuar

Selective Metric Differential Privacy for Language Models

dc.contributor.advisor	De Cock, Martine
dc.contributor.author	Maratkhan, Anuar
dc.date.accessioned	2024-02-12T23:38:12Z
dc.date.available	2024-02-12T23:38:12Z
dc.date.issued	2024-02-12
dc.date.submitted	2023
dc.description	Thesis (Master's)--University of Washington, 2023
dc.description.abstract	Recent advancements in pre-trained language models (LMs) have led to many breakthroughs in Natural Language Processing (NLP). When applied for downstream tasks, such as text classifiers or chatbots, LMs can leak information about the large text corpora they were trained on. In privacy-preserving machine learning, it is common to apply Differential Privacy (DP) mechanisms that mitigate such leakage. The traditional notion of DP, where each record in the data is treated as sensitive, does not translate well to NLP tasks since some token sequences - such as addresses and social security numbers - may be sensitive while others are not. We introduce the new notion of Selective Metric Differential Privacy (SMDP) and a concrete mechanism to realize SMDP. To this end, we draw upon the recently proposed notions of Selective DP, in which records are treated as sensitive or not, and Metric DP, in which the notion of adjacent inputs is relaxed through the use of a metric. Our experiments show that GPT models trained on data privatized with our SMDP approach have higher utility than with Metric DP while preserving the same level of privacy protection.
dc.embargo.terms	Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	Maratkhan_washington_0250O_26461.pdf
dc.identifier.uri	http://hdl.handle.net/1773/51067
dc.language.iso	en_US
dc.rights	none
dc.subject
dc.subject	Computer science
dc.subject.other
dc.title	Selective Metric Differential Privacy for Language Models
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Maratkhan_washington_0250O_26461.pdf
Size:: 478.07 KB
Format:: Adobe Portable Document Format

Download

Collections

Computer science and systems (Tacoma)