Secure Training of Random Forest Classifiers over Continuous Data

dc.contributor.advisorDe Cock, Martine
dc.contributor.authorShen, Jianwei
dc.date.accessioned2020-04-30T17:39:56Z
dc.date.issued2020-04-30
dc.date.submitted2020
dc.descriptionThesis (Master's)--University of Washington, 2020
dc.description.abstractExisting Secure Multi-Party Computation (MPC) protocols for privacy-preserving training of decision trees over distributed data assume that the attributes are categorical. In real-life applications, attributes are often numerical. The standard ''in the clear'' algorithm to grow decision trees on data with continuous values requires sorting of training examples for each attribute in each node in the quest for an optimal cut-point in the range of attribute values. Sorting is a prohibitively expensive operation in MPC, hence secure protocols that mimic the traditional decision tree training algorithm are very inefficient. In this thesis, we propose an alternative, more efficient strategy for secure training of decision tree-based models on data with continuous attributes, namely secure discretization of the data, followed by secure training of a random forest classifier over the discretized data. In addition to mathematically proving that the proposed approach is correct and secure, we experimentally evaluate it in terms of classification accuracy and runtime on a variety of benchmark data sets. To the best of our knowledge, our approach is the very first to privately train decision tree-based models with continuous attributes where the overall complexity depends only linearly on the size of the entire training data set -- contrary to existing sorting-based solutions.
dc.embargo.lift2025-04-04T17:39:56Z
dc.embargo.termsRestrict to UW for 5 years -- then make Open Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherShen_washington_0250O_21297.pdf
dc.identifier.urihttp://hdl.handle.net/1773/45414
dc.language.isoen_US
dc.rightsnone
dc.subjectMachine Learning
dc.subjectMPC
dc.subjectPrivacy-Preserving
dc.subjectRandom Forest
dc.subjectSMC
dc.subjectComputer science
dc.subject.otherComputer science and systems
dc.titleSecure Training of Random Forest Classifiers over Continuous Data
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Shen_washington_0250O_21297.pdf
Size:
650.67 KB
Format:
Adobe Portable Document Format