Enhancing Privacy in AI: Differential Privacy in Multiparty Computation
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Artificial Intelligence (AI) based applications provide a lot of convenience but often rely on sensitive data of a personal nature to work well. AI pipelines raise input privacy concerns when AI models need to be trained across combined data from multiple data holders who may not be willing or even legally allowed to disclose their data to each other. Similarly, output privacy concerns arise when trained AI models are deployed in production and inadvertently leak private information about the individuals in the training data. A popular approach to address input privacy is Federated Learning (FL), a paradigm in which models are trained in a distributed manner so that raw personal data never leaves the source. State-of-the-art techniques to mitigate output privacy risks use Differential Privacy (DP), which obfuscates the presence of individual records in the training data by adding noise. Existing solutions combining traditional FL and DP to provide input and output privacy at the same time typically cater to specific data partitions (horizontal or vertical) and sacrifice a lot of accuracy to achieve privacy. In this dissertation, we focus on providing both input and output privacy guarantees when data is distributed across multiple data holders irrespective of the data partitioning. We provide solutions to preserve privacy when (a) training discriminative machine learning models for prediction; (b) mitigating biases in model predictions; (c) training AI models for synthetic data generation. All our solutions are grounded in novel Secure Multi-Party Computation (MPC) protocolsto provide input privacy for any data partition -- offering a single solution for horizontal, vertical, or mixed partitions.
To simultaneously provide output privacy while maintaining high utility, we leverage the idea of a ''DP-in-MPC'' paradigm through the development of MPC protocols that emulate centralized DP even when the data resides with multiple data holders in a distributed manner. Our research is characterized by the quest for such solutions that (1) enable training of high utility AI models, (2) in a manner sufficiently efficient for use in practice, (3) without compromising individual privacy.
Description
Thesis (Ph.D.)--University of Washington, 2025
