Disrupting Machine Learning: Emerging Threats and Applications for Privacy and Dataset Ownership

Evtimov, Ivan

Disrupting Machine Learning: Emerging Threats and Applications for Privacy and Dataset Ownership

Files

Evtimov_washington_0250E_22771.pdf (4.66 MB)

Date

2021-08-26

Authors

Evtimov, Ivan

Abstract

Convolutional neural networks (CNNs) can be trained with machine learning techniques by using large datasets of images to solve a multitude of useful computer vision tasks. However, CNNs also suffer from a set of vulnerabilities that allow maliciously crafted inputs to affect both their inference and training. A central premise of this dissertation is that these vulnerabilities exhibit a duality when it comes to security and privacy. On the one hand, when computer vision models are applied in safety-critical settings such as autonomous driving, it is important to identify failures that can be exploited by malicious parties early on so that system designers can plan for novel threat models. On the other hand, when machine learning models themselves are being used in a malicious or unauthorized manner, such vulnerabilities can be leveraged to protect data creators from harmful effects of these models (such as privacy degradation) and enforce finer-grained “access” controls over the data. This work studies security and privacy issues in three scenarios where machine learning is applied for visual tasks. The first contribution of this work is to identify a vulnerability in models that are likely to be deployed to identify road signs in autonomous vehicles. It demonstrates that an attacker with no digital access to a self-driving car’s computers can nevertheless cause dangerous behavior by modifying the appearance of physical objects. Next, this dissertation considers scenarios where machine learning models are applied in a way that degrades individual privacy. The dissertation proposes a scheme -- nicknamed FoggySight -- in which a community of users volunteer adversarial modified photos (“decoys”) that poison the facial search database and throw off searches in it. Finally, machine learning models may be trained on data without authorization to do so. This dissertation considers scenarios where image owners might wish to share their visual data widely for human consumption but do not wish to enable its use for machine learning purposes. It develops a protective mechanism that can be applied to datasets before they are released so that unauthorized parties cannot train their models on them.