A Model-to-data Approach for Building Accurate Machine Learning Algorithms on EHR Data

Yan, Yao

A Model-to-data Approach for Building Accurate Machine Learning Algorithms on EHR Data

dc.contributor.advisor	Mooney, Sean D.
dc.contributor.author	Yan, Yao
dc.date.accessioned	2022-09-23T20:42:01Z
dc.date.available	2022-09-23T20:42:01Z
dc.date.issued	2022-09-23
dc.date.submitted	2022
dc.description	Thesis (Ph.D.)--University of Washington, 2022
dc.description.abstract	Over the past few decades, information about patients’ diagnoses, medication, and procedures has been collected and transformed into standardized and shareable electronic health records (EHRs). Machine learning algorithms have proven efficient for mining predictive clinical patterns from EHRs and thus can be used to guide next-generation personalized medicine and enable effective clinical decision support. However, privacy concerns often limit access to individual patient data, hampering researchers’ capability to develop machine learning models and conduct model generalizability assessments. Creating an infrastructure that enables secure utilization of patient data with adequate privacy control is the key to bridging researchers and data, thereby unlocking the full potential of the data. In addition, facilitating the contribution of data from multiple sites and enabling federated evaluation are essential for developing robust and generalizable models and overcoming the barrier to clinical implementation. A ‘model to data’ approach, in which researchers build and submit models to be evaluated by a trusted party without direct access to data, can reduce the risk posed by direct data sharing, lower the barrier to federated evaluation, and open up data for utilization by the broader data science community. In this dissertation, I focus on the implementation of a ‘model to data’ approach for enabling secure utilization of multi-modal patient data, and on synthetic EHR data generation as a complement to this approach. The 4 aims of my dissertation are (1) Piloting a 'model to data' approach to enable patient mortality prediction; (2) Implementing the 'model to data' approach in a crowdsourced benchmarking challenge for COVID-19 outcome prediction; (3) Enabling clinical notes sharing and de-identification through the NLP sandbox; and (4) Benchmarking generative adversarial network (GAN)-related synthetic EHR generation on real-world patient data.
dc.embargo.terms	Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	Yan_washington_0250E_24689.pdf
dc.identifier.uri	http://hdl.handle.net/1773/49241
dc.language.iso	en_US
dc.rights	CC BY
dc.subject	Data sharing
dc.subject	Electronic health records
dc.subject	Generative adversarial network
dc.subject	Machine learning
dc.subject	Model evaluation
dc.subject	Natural language processing
dc.subject	Information technology
dc.subject.other	Molecular engineering
dc.title	A Model-to-data Approach for Building Accurate Machine Learning Algorithms on EHR Data
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Yan_washington_0250E_24689.pdf
Size:: 5.12 MB
Format:: Adobe Portable Document Format

Download

Collections

Molecular engineering