Making Health Knowledge Accessible Through Personalized Language Processing

dc.contributor.advisorCohen, Trevor T.C.
dc.contributor.authorGuo, Yue
dc.date.accessioned2024-09-09T23:02:00Z
dc.date.available2024-09-09T23:02:00Z
dc.date.issued2024-09-09
dc.date.submitted2024
dc.descriptionThesis (Ph.D.)--University of Washington, 2024
dc.description.abstractThe 2019 COVID pandemic exposed the difficulties the general public faces when attempting to use scientific information to guide their health-related decisions. Though widely available in scientific papers, the information required to guide these decisions is often not accessible: medical jargon, scientific writing styles, and insufficient background explanations make this information opaque to non-experts. Consequently, there is a pressing need to deliver scientific knowledge in lay language, which has motivated my research on automated plain language summary generation to make health information more accessible. The main challenges addressed in this thesis are limited data, generating background knowledge, lack of evaluation metrics, and the need for personalization. To tackle the limited data challenge, I introduce the task of automated generation of plain language summaries (PLSs) of biomedical scientific reviews and construct the Corpus for Enhancement of Lay Language Synthesis (CELLS), the largest and most diverse dataset for PLS in the medical domain. For generating background knowledge, I explore methods for Retrieval-Augmented Lay Language (RALL) generation, augmenting state-of-the-art text generation models with information retrieval from various sources. A key part of this process has been evaluating existing metrics to see if they effectively measure performance for this task, and considering if there might be better options. To address the lack of evaluation metrics, I present APPLS, the first granular testbed for analyzing evaluation metric performance for PLS, and introduce POMME, a new metric that employs language model perplexity to assess text simplicity. Finally, I broaden the discussion beyond health information - exploring how we can personalize and improve communication across different domains. Grounded in the real-world setting of interdisciplinary reading, this research offers insights into features and methods for the novel task of integrating personal data into scientific jargon identification. In conclusion, my thesis provides a comprehensive approach to making biomedical literature more accessible and understandable for health consumers by addressing key challenges in developing automated PLS generation systems. The contributions span data collection, method development, evaluation metric design, and personalization, paving the way for more effective communication of health information to the general public.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherGuo_washington_0250E_26962.pdf
dc.identifier.urihttps://hdl.handle.net/1773/51739
dc.language.isoen_US
dc.rightsCC BY-NC
dc.subjecthealth communication
dc.subjectNLP
dc.subjectplain language summarization
dc.subjectComputer science
dc.subject.otherBiomedical and health informatics
dc.titleMaking Health Knowledge Accessible Through Personalized Language Processing
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Guo_washington_0250E_26962.pdf
Size:
8.08 MB
Format:
Adobe Portable Document Format