A Model-to-data Approach for Building Accurate Machine Learning Algorithms on EHR Data
| dc.contributor.advisor | Mooney, Sean D. | |
| dc.contributor.author | Yan, Yao | |
| dc.date.accessioned | 2022-09-23T20:42:01Z | |
| dc.date.available | 2022-09-23T20:42:01Z | |
| dc.date.issued | 2022-09-23 | |
| dc.date.submitted | 2022 | |
| dc.description | Thesis (Ph.D.)--University of Washington, 2022 | |
| dc.description.abstract | Over the past few decades, information about patients’ diagnoses, medication, and procedures has been collected and transformed into standardized and shareable electronic health records (EHRs). Machine learning algorithms have proven efficient for mining predictive clinical patterns from EHRs and thus can be used to guide next-generation personalized medicine and enable effective clinical decision support. However, privacy concerns often limit access to individual patient data, hampering researchers’ capability to develop machine learning models and conduct model generalizability assessments. Creating an infrastructure that enables secure utilization of patient data with adequate privacy control is the key to bridging researchers and data, thereby unlocking the full potential of the data. In addition, facilitating the contribution of data from multiple sites and enabling federated evaluation are essential for developing robust and generalizable models and overcoming the barrier to clinical implementation. A ‘model to data’ approach, in which researchers build and submit models to be evaluated by a trusted party without direct access to data, can reduce the risk posed by direct data sharing, lower the barrier to federated evaluation, and open up data for utilization by the broader data science community. In this dissertation, I focus on the implementation of a ‘model to data’ approach for enabling secure utilization of multi-modal patient data, and on synthetic EHR data generation as a complement to this approach. The 4 aims of my dissertation are (1) Piloting a 'model to data' approach to enable patient mortality prediction; (2) Implementing the 'model to data' approach in a crowdsourced benchmarking challenge for COVID-19 outcome prediction; (3) Enabling clinical notes sharing and de-identification through the NLP sandbox; and (4) Benchmarking generative adversarial network (GAN)-related synthetic EHR generation on real-world patient data. | |
| dc.embargo.terms | Open Access | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.other | Yan_washington_0250E_24689.pdf | |
| dc.identifier.uri | http://hdl.handle.net/1773/49241 | |
| dc.language.iso | en_US | |
| dc.rights | CC BY | |
| dc.subject | Data sharing | |
| dc.subject | Electronic health records | |
| dc.subject | Generative adversarial network | |
| dc.subject | Machine learning | |
| dc.subject | Model evaluation | |
| dc.subject | Natural language processing | |
| dc.subject | Information technology | |
| dc.subject.other | Molecular engineering | |
| dc.title | A Model-to-data Approach for Building Accurate Machine Learning Algorithms on EHR Data | |
| dc.type | Thesis |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Yan_washington_0250E_24689.pdf
- Size:
- 5.12 MB
- Format:
- Adobe Portable Document Format
