The Use of Natural Language Processing and Machine Learning for Early Diagnosis of Lung and Ovarian Cancer
Loading...
Date
Authors
TURNER, GRACE
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Cancer is a serious diagnosis and diagnostic delay is correlated with reductions in survivalrates following treatment. For many cancers, providers can only rely on symptoms and signs
to diagnose patients. These details are recorded primarily free text clinical notes. Natural
language processing (NLP) can be used to extract symptoms/signs from these notes for
population level diagnosis screening. This creates opportunity for machine learning to alert
providers earlier in the diagnostic process using existing, but easily overlooked information. Thus, the focus of this thesis was to determine opportunities for reducing diagnostic delayin ovarian and lung cancer. A symptom extraction model trained on a primarily COVID-19
population was adapted to lung and ovarian cancer populations. The model then extracted
symptoms/signs from a retrospective case-control study (ovarian) developed as part of this
work as a well a leveraged study (lung). Symptom frequencies for ovarian cancer were then
explored across different routes to diagnosis. Finally, this thesis developed experiments using
machine learning models to predict lung and ovarian cancer prior to diagnosis. This work
showed early prediction using symptoms was only possible on the lung cohort. Nevertheless,
both cohorts had significantly higher “next step” recommendations in cases as compared to
controls, even 6 months prior to diagnosis.
Description
Thesis (Master's)--University of Washington, 2022
