A Model-to-data Approach for Building Accurate Machine Learning Algorithms on EHR Data

dc.contributor.advisorMooney, Sean D.
dc.contributor.authorYan, Yao
dc.date.accessioned2022-09-23T20:42:01Z
dc.date.available2022-09-23T20:42:01Z
dc.date.issued2022-09-23
dc.date.submitted2022
dc.descriptionThesis (Ph.D.)--University of Washington, 2022
dc.description.abstractOver the past few decades, information about patients’ diagnoses, medication, and procedures has been collected and transformed into standardized and shareable electronic health records (EHRs). Machine learning algorithms have proven efficient for mining predictive clinical patterns from EHRs and thus can be used to guide next-generation personalized medicine and enable effective clinical decision support. However, privacy concerns often limit access to individual patient data, hampering researchers’ capability to develop machine learning models and conduct model generalizability assessments. Creating an infrastructure that enables secure utilization of patient data with adequate privacy control is the key to bridging researchers and data, thereby unlocking the full potential of the data. In addition, facilitating the contribution of data from multiple sites and enabling federated evaluation are essential for developing robust and generalizable models and overcoming the barrier to clinical implementation. A ‘model to data’ approach, in which researchers build and submit models to be evaluated by a trusted party without direct access to data, can reduce the risk posed by direct data sharing, lower the barrier to federated evaluation, and open up data for utilization by the broader data science community. In this dissertation, I focus on the implementation of a ‘model to data’ approach for enabling secure utilization of multi-modal patient data, and on synthetic EHR data generation as a complement to this approach. The 4 aims of my dissertation are (1) Piloting a 'model to data' approach to enable patient mortality prediction; (2) Implementing the 'model to data' approach in a crowdsourced benchmarking challenge for COVID-19 outcome prediction; (3) Enabling clinical notes sharing and de-identification through the NLP sandbox; and (4) Benchmarking generative adversarial network (GAN)-related synthetic EHR generation on real-world patient data.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherYan_washington_0250E_24689.pdf
dc.identifier.urihttp://hdl.handle.net/1773/49241
dc.language.isoen_US
dc.rightsCC BY
dc.subjectData sharing
dc.subjectElectronic health records
dc.subjectGenerative adversarial network
dc.subjectMachine learning
dc.subjectModel evaluation
dc.subjectNatural language processing
dc.subjectInformation technology
dc.subject.otherMolecular engineering
dc.titleA Model-to-data Approach for Building Accurate Machine Learning Algorithms on EHR Data
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Yan_washington_0250E_24689.pdf
Size:
5.12 MB
Format:
Adobe Portable Document Format