Privacy-Preserving Machine Learning Applications
Todoki, Ariel Akemi
MetadataShow full item record
Machine learning has its many applications in different areas of interest that involves huge amounts of data in order to learn how to recognize, predict, and classify. One such area is in text classification where a private text message (i.e. SMS message, tweet, email) needs to be scored against a text classification model that contains proprietary information. Another such area is in medical diagnosis involving patient data in the form of medical records, test results, images, and genome data. While machine learning can be very successful in predicting and classifying, it relies on learning from user data that is private, confidential, and could include personally identifiable information. In this thesis, we use privacy-preserving techniques to: (i) train a logistic regression model on breast cancer tissue data and (ii) classify private texts using an adaboost model or a logistic regression model such that patient/user data is kept private. These techniques are information-theoretically secure, meaning the security does not rely on computational complexities (which could be broken in the future), and allows us to benefit from machine learning on data that would otherwise be kept private or unshared. In general, privacy-preserving techniques for computations run slower than when computed in the clear. To the best of our knowledge, our implementations show excellent runtimes compared to previous works for their respective tasks with no loss of accuracy.