Secure Training of Random Forest Classifiers over Continuous Data
| dc.contributor.advisor | De Cock, Martine | |
| dc.contributor.author | Shen, Jianwei | |
| dc.date.accessioned | 2020-04-30T17:39:56Z | |
| dc.date.issued | 2020-04-30 | |
| dc.date.submitted | 2020 | |
| dc.description | Thesis (Master's)--University of Washington, 2020 | |
| dc.description.abstract | Existing Secure Multi-Party Computation (MPC) protocols for privacy-preserving training of decision trees over distributed data assume that the attributes are categorical. In real-life applications, attributes are often numerical. The standard ''in the clear'' algorithm to grow decision trees on data with continuous values requires sorting of training examples for each attribute in each node in the quest for an optimal cut-point in the range of attribute values. Sorting is a prohibitively expensive operation in MPC, hence secure protocols that mimic the traditional decision tree training algorithm are very inefficient. In this thesis, we propose an alternative, more efficient strategy for secure training of decision tree-based models on data with continuous attributes, namely secure discretization of the data, followed by secure training of a random forest classifier over the discretized data. In addition to mathematically proving that the proposed approach is correct and secure, we experimentally evaluate it in terms of classification accuracy and runtime on a variety of benchmark data sets. To the best of our knowledge, our approach is the very first to privately train decision tree-based models with continuous attributes where the overall complexity depends only linearly on the size of the entire training data set -- contrary to existing sorting-based solutions. | |
| dc.embargo.lift | 2025-04-04T17:39:56Z | |
| dc.embargo.terms | Restrict to UW for 5 years -- then make Open Access | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.other | Shen_washington_0250O_21297.pdf | |
| dc.identifier.uri | http://hdl.handle.net/1773/45414 | |
| dc.language.iso | en_US | |
| dc.rights | none | |
| dc.subject | Machine Learning | |
| dc.subject | MPC | |
| dc.subject | Privacy-Preserving | |
| dc.subject | Random Forest | |
| dc.subject | SMC | |
| dc.subject | Computer science | |
| dc.subject.other | Computer science and systems | |
| dc.title | Secure Training of Random Forest Classifiers over Continuous Data | |
| dc.type | Thesis |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Shen_washington_0250O_21297.pdf
- Size:
- 650.67 KB
- Format:
- Adobe Portable Document Format
