Machine Learning based Attacks and Defenses in Computer Security: Towards Privacy and Utility Balance in Emerging Technology Environments
MetadataShow full item record
The growth of smart devices and the Internet of Things (IoT) is driving data markets in which users exchange sensor streams for services. In most current data exchange models, service providers offer their functionality to users who opt-in to sharing the entirety of the available sensor data (i.e., maximal data harvesting). Motivated by the unexplored risks in emerging smart sensing technologies this dissertation applies the lens of machine learning to understand (1) to what degree information leakage can be exploited for privacy attacks and (2) how can can we go about mitigating privacy risks while still enabling utility and innovation in sensor contexts. In the first part of this work, we experimentally investigate the potential to amplify information leakage to unlock unwanted [potentially harmful] inferences in the following emerging technologies: (i) smart homes -- we show that electromagnetic noise on the powerline can be used to determine what is being watched on TVs, and (ii) smart cars -- messages between sensors and control units can be used to determine the unique identity of the driver. We use the insights gained from these investigations to develop a theoretical balancing framework (SensorSift) which provides algorithmic guarantees to mitigate the risks of information sharing while still enabling useful functionality to be extracted from the data. SensorSift acts as a data clearing house which applies transformations to the sensor stream such that the output simultaneously (1) minimizes the potential for accurate inference of data attributes defined as private [by the user], while (2) maximizing the inferences about application requested attributes verified to be non-private (public). We evaluate SensorSift in the context of automated face understanding and show that it is possible to successfully create diverse policies which selectively hide and reveal visual attributes in a public dataset of celebrity face images (i.e., prevent inference of race and gender while enabling inference of smiling). Through our work we hope to offer a more equitable balance between producer and consumer interests in sensor data markets by enabling quantitatively provable privacy contracts which (1) allow flexible user values to be expressed (and certifiably upheld), while (2) simultaneously allowing for innovation from service providers by supporting unforeseen inferences for non-private attributes. Stepping back, in this disseration we have identified the potential for machine inference to amplify data leakage in sensor contexts and have provided one direction for mitigating information risks through theoretical balance of utility and privacy.