Privacy meets Robustness: Unveiling the interplay between Differential Privacy and Robustness in Machine Learning

Liu, Xiyang

Privacy meets Robustness: Unveiling the interplay between Differential Privacy and Robustness in Machine Learning

Files

Liu_washington_0250E_27566.pdf (2.2 MB)

Date

2025-01-23

relationships.isAuthorOf

Liu, Xiyang

Abstract

The rapid advancement of machine learning over the past decade has been driven by the increasing availability of large-scale datasets. However, this growth has raised critical concerns regarding the privacy of individuals whose data is being used, as well as the robustness of algorithms against potentially malicious data corruption from unreliable sources. This thesis aims to explore the fundamental interplay between differential privacy (DP) and outlier robustness in machine learning. This thesis investigates several canonical statistical problems to uncover the inherent connections between DP and robustness. The first problem addresses whether it is possible to develop algorithms that are both differentially private and robust to outliers without requiring additional data. We present the first efficient algorithm with sub-optimal sample complexity. Then, we introduce a unifying framework that achieves nearly optimal sample complexity, without considering computational efficiency, across various problems, including mean estimation, linear regression, covariance estimation, and principal component analysis (PCA). Finally, we propose two efficient algorithms that achieve near-optimal sample complexity for differentially private PCA and linear regression. The findings of this research contribute to a deeper understanding of the interplay between privacy and robustness, providing new insights into the design of algorithms that are both statistically optimal and computationally efficient for practical applications. The results presented in this thesis open avenues for further exploration into the protection of data privacy, particularly in high-dimensional and adversarial settings.