Biomedical and health informatics

Permanent URI for this collectionhttps://digital.lib.washington.edu/handle/1773/4899

Browse

Now showing 1 - 20 of 118

Cultural Adaptation and Evaluation of LLM-Driven Mental Health Conversational Agents
(2025-10-02) Xie, Jinchen; Cohen, Trevor T.C.
Mental health disparities disproportionately affect underserved family caregivers, due in part to the limited availability of culturally responsive interventions. While large language model (LLM)-driven conversational agents hold promise for scalable mental health support, their responses often lack cultural responsiveness, undermining empathy, therapeutic alliance, and engagement among diverse populations. This dissertation develops and evaluates a novel approach for dynamic cultural adaptation of LLM-based mental health agents leveraging context engineering. The process began with stakeholder engagement to identify culturally salient caregiving challenges among Chinese American family caregivers (Aim 1). In collaboration with domain experts, a cultural context database was developed to capture these challenges alongside culturally responsive response examples, enabling real-time retrieval and integration of relevant context during agent interactions. To evaluate this approach, I conducted a contextualized, multiphase evaluation through a series of user studies with Chinese American and Latino American family caregivers. Findings show that the context-engineered system consistently generated responses rated as more culturally responsive and empathic than both a prompt-based adaptation strategy and a baseline non-adapted agent (Aim 2). In a randomized user study with Chinese American caregivers (Aim 3), the culturally adapted agent significantly improved near-term emotional well-being and received higher ratings for cultural competence and therapeutic alliance compared to the non-adapted agent. Notably, participants indicated greater willingness to recommend the adapted agent within their communities. Together, this work contributes: (1) a workflow for adapting LLM-based conversational agents to diverse populations, (2) empirical evidence that cultural adaptation enhances user outcomes in mental health agents, and (3) a human-centered evaluation procedure for assessing cultural responsiveness of AI in mental health contexts. While future research is needed to refine and validate this approach across additional cultural groups and care domains, the findings represent a significant step toward advancing equitable, culturally responsive digital mental health support and reducing persistent disparities in access and quality of care.
Evaluating Multi-Modal Data Fusion Approaches for Predictive Clinical Models Using Multiple Medical Data Domains
(2025-08-01) Alipour, Ehsan; Tarczy-Hornoch, Peter; Hadlock, Jennifer
Disease outcome prediction is a central research focus in biomedical informatics, as it facilitates precision health related interventions and scientific discovery by enabling digital clinical trials and multiple other benefits. Multimodal deep learning models have emerged as powerful tools in biomedical research, offering the ability to integrate diverse data sources such as clinical records, multi-omics data, imaging, survey responses, and wearable data to enhance predictive accuracy and deepen understanding of medical phenomena. Central to multimodal modeling is the process of data fusion, where information from different modalities is integrated into a unified model. Three primary fusion strategies exist in deep learning: early fusion (feature-level), intermediate fusion and late fusion (decision-level). While widely adopted in other domains, their comparative performance and implementation considerations remain underexplored in biomedical applications, where data heterogeneity, missingness, and varying dimensionality present additional challenges.This dissertation aims to evaluate the implications of data fusion strategies for developing multimodal predictive models in medicine. Across three distinct aims, I assess the impact of early, intermediate, and late fusion techniques on predictive performance, implementation complexity, and generalizability using diverse combinations of data types, outcomes, and modeling strategies. These studies span multiple datasets and outcome types (binary categorial variables vs continuous ratio variables) providing a broad view of fusion strategy utility in real-world biomedical settings. In Aim 1—Evaluation and comparison of early, intermediate, and late fusion techniques for combining exposures, clinical and genomics data for disease risk prediction task using All of Us: Risk of CKD in patients with type 2 diabetes—I evaluated and compared early, intermediate, and late fusion strategies for integrating longitudinal EHR, genomic, and survey data to predict chronic kidney disease (CKD) progression in patients with type 2 diabetes using a novel transformer-based multimodal architecture. Using data from the NIH’s All of Us initiative, I trained models on a cohort of approximately 40,000 patients. While the best performing unimodal model achieved a baseline performance with an AUROC of 0.73 (0.71 - 0.75), the inclusion of multimodal data offered only marginal improvement with an AUROC of 0.74 (0.72 – 0.76), with the benefit limited to the early fusion approach and lacking statistical significance. This aim highlighted the challenges of integrating multimodal data with different dimensions using transformer models and emphasized the role of modality-specific relative predictive strength. In Aim 2—Development and assessment of the incremental value of combining a deep convolutional neural network feature extractor on imaging data and clinical data on a binary prediction task: Predict post-surgical margin status in soft tissue sarcoma—I extended the fusion analysis to imaging data by combining a convolutional neural network (CNN) trained on longitudinal cross-sectional imaging with a shallow neural network trained on clinical and pathology variables to predict post-surgical margin status in patients with soft tissue sarcoma (n=202). Here, the intermediate fusion strategy significantly outperformed other approaches, achieving an AUROC of 0.80 (0.66–0.95), suggesting that cross-modal interactions between histologic features and imaging embeddings may be best captured through intermediate fusion. This result demonstrated the potential value of intermediate fusion when complementary signals exist across modalities. In Aim 3—Evaluation and comparison of early, intermediate, and late fusion techniques for combining imaging and clinical data on a regression prediction task: Estimation of CT-based body composition metrics from chest radiographs—I explored fusion strategies for estimating continuous CT-derived body composition metrics (e.g., visceral, and subcutaneous fat volumes) using only chest radiographs and clinical variables in a dataset of 1,088 patients. A multitask multimodal model was developed and evaluated across early, intermediate, and late fusion strategies. Late fusion consistently delivered the best performance across most body composition metrics, closely followed by intermediate fusion. These results suggest that when individual modalities offer high independent predictive power, decision-level integration may be optimal for regression tasks. Collectively, these aims provide a broad evaluation of data fusion strategies in multimodal biomedical modeling, highlighting their strengths, limitations, and practical considerations. Findings suggest that no single fusion strategy universally outperforms the others; rather, optimal fusion depends on data characteristics, model architecture, and task-specific objectives. This dissertation lays the groundwork for future research aimed at developing adaptive fusion strategies tailored to the complexities of real-world biomedical data.
Evaluating and Enhancing Large Language Models (LLMs) in the Clinical Domain
(2025-08-01) Fu, Yujuan; Yetişgen, Meliha
Recent advancements in large language models (LLMs) have demonstrated human-level performance on many specialized medical tasks, even without annotated training data. However, three main challenges remain: (1) due to the sensitive and highly specialized nature of clinical narratives, as well as the high cost of human expert annotation, there is a lack of high-quality, well-structured, and clinically meaningful datasets for LLM training and evaluation; (2) current medical LLMs show limited generalization ability to interpret and extract complex clinical information on certain unseen natural language understanding (NLU) tasks; and (3) as LLMs are typically trained on vast amounts of data, there is a substantial risk of data contamination, where evaluation benchmarks unintentionally overlap with training data, leading to inflated test performance and potentially reduced performance on truly novel tasks. In this work, we address these limitations through three core aims: (1) develop benchmark datasets for clinical information extraction (IE), a key NLU subtask, across two critical medical domains, and evaluate the performance of multiple state-of-the-art (SOTA) transformer-based language models (LMs), under both fine-tuning and in-context learning settings; (2) develop a more generalizable medical NLU model via instruction tuning, demonstrating enhanced performance on previously unseen clinical NLU datasets; and (3) systematically review existing detection approaches for data contamination and evaluate those approaches on datasets used during pre-training and fine-tuning LLMs, with our own and three other widely used open-source LLMs. In summary, our work contributes to the development of both clinical benchmarks and robust LLMs, as well as highlighting the ongoing challenges in benchmarking LLMs' generalizability.
Leveraging Multimodal Models to Detect Osteoporotic Compression Fractures
(2025-05-12) Chang, Brian; Tarczy-Hornoch, Peter
Osteoporosis is a chronic disease of low bone mineral density that affects older patients, predisposing them to fractures. While osteoporosis screening is evidence-based, it remains grossly under-utilized. Osteoporotic compression fractures (OCFs) are an early biomarker for osteoporosis but are often misclassified and under-reported on review by radiologists. Opportunistic screening, or leveraging pre-existing data, to detect OCFs to augment the current standard of osteoporosis screening could prompt appropriate diagnostic studies, treatment, and risk management. Current fracture detection tools show promise but are limited by key factors, namely manual curation of data inputs and lack of external validation and generalizability, limiting their potential clinical utility. They are also based on unimodal models, or those that leverage a single data modality. Multimodal models that leverage more than one modality have shown improved performance in clinical tasks and also better reflect real-world clinical workflows.In this dissertation, I focus on developing and evaluating multimodal models to detect OCFs, leveraging unstructured clinical notes, radiographs, and structured electronic health record (EHR) data. To achieve this, a spine radiograph dataset from previous work in our group was used. Matching patient IDs from this dataset, I obtained clinical notes from a quaternary healthcare enterprise database to annotate fracture events. With these datasets, I implemented and evaluated unimodal models for each of the modalities above (images only and notes only) to produce outputs for the multimodal models, described in the following aims in this research: 1) Aim 1: Implement and evaluate transformer models to extract fracture events from clinical notes. An ensemble algorithm to consolidate fracture events at the note- and patient-level was also developed to produce both structured data representing a patient history of fractures and feature representations for downstream input separately to multimodal models (Aim 3). Evaluation metrics demonstrated that fine-tuned transformer models are able to extract fracture events from clinical notes with good performance, albeit limited by the small training corpus. 2) Aim 2: Develop and assess an imaging analysis pipeline for detecting OCFs. An imaging analysis pipeline consisting of independently developed machine learning models were chained in a fully automated framework. Evaluation of the pipeline was performed with a dataset of radiographs acquired in various clinical settings to measure real-world performance. While we were able to develop a performant fully automated pipeline, the evaluation demonstrated subpar performance in detecting positive cases for OCFs at the image-level. 3) Aim 3: Develop and assess whether multimodal models combining NLP, imaging analysis, and structured EHR data perform better than imaging analysis alone in detecting OCFs. With the structured data and feature representations from Aim 1, the imaging analysis pipeline predictions from Aim 2, and other structured EHR data, numerous multimodal model architectures were trained and evaluated in detecting OCFs at the patient-level. The evaluation of these models demonstrated better performance than the unimodal models (images and notes only) in detecting OCFs even with a small training corpus, reaching an acceptable absolute performance.
Leveraging temporality, dose effect, and co-medication to improve drug safety surveillance
(2025-05-12) Wu, YiFan; Cohen, Trevor A
Adverse drug reactions (ADRs) rank among the top causes of morbidity and mortality worldwide, yet current post-market drug surveillance systems often relying on spontaneous reporting. They suffer from under-reporting of ADRs and limited capture of clinical context. This dissertation addresses these gaps by leveraging electronic health record (EHR) data and transformer-based models to detect ADRs and drug–drug interactions (DDIs) more effectively. First, we develop and evaluate a generative transformer architecture (GPT-2) trained from scratch on longitudinal EHR data from two distinct repositories (MIMIC-IV and a large university health system, UW). Unlike traditional disproportionality metrics that focus on cross-sectional drug-event co-occurrences, the proposed model captures temporal relationships and contextual dependencies among medications, diagnoses, and outcomes. Second, we introduce a "value-aware" embedding approach to incorporate continuous numeric data, such as drug dosages and lab measurements. Experimental results show that these value-aware embeddings further improve model performance, outperforming baseline transformer architectures that did not have numeric data. Third, we extend the model's scope to evaluate DDIs under polypharmacy conditions, demonstrating that a transformer exceeded the predictive accuracy of simpler machine learning baselines. Across all EHR datasets tested, the transformer-based methods consistently surpass existing standard approaches in ADR detection, measured by improved area under the receiver operating characteristic curve (AUROC). By capturing time-dependent patterns, integrating numeric variables, and accounting for interacting drug exposures, this work broadens the capabilities of pharmacovigilance beyond conventional "signal detection" practices. Despite practical limitations such as limited generalizability to other health systems and the need for rigorous validation—these findings demonstrated the promise of generative transformers as a scalable, data-driven framework for enhancing patient safety for pharmacovigilance.
Building Robust Text Classification Models under Provenance Shift: Methods of Adjustment and a Framework for Evaluation
(2025-01-23) Ding, Xiruo; Cohen, Trevor
Machine learning and deep learning have consistently delivered groundbreaking contributions across a wide range of disciplines. Biomedical research also benefits from such methods at every scale, from the molecular level (such as in structural biology) to the population level. Many learning algorithms require adequate amount of data to fully train a model, and also assume no difference between the training data and test data. This may be achievable for problems in the general domain. For example, large datasets exist for computer vision (CIFAR-10, CIFAR-100, etc.) and natural language processing (Amazon Reviews, Yelp Reviews, Wikipedia, etc.). However, in biomedical research, it is challenging to collect data on the order of millions when high quality patient-related data are needed. One feasible solution is to combine data from several sites. This approach can also increase the variety of data, thus helping to build robust models. However, the models trained on such settings may recognize spurious correlations between data provenance and the target of interest. Naturally, this can also happen when subpopulations exist, each of which has different characteristics. This effect can be detrimental when model is deployed in a new setting where provenance composition shifts. This thesis builds on such scenarios where confounding by provenance and provenance shift are the main concerns. Formal definitions and a simulation framework are introduced first. Building upon these, the aim is to find useful ways to build models that are robust to such provenance shift while maintaining reasonable performance. This goal is attained through different means, from statistical adjustment through distribution adjustment to architecture adjustment. Two key contributions are: (1) a framework for experimentally simulating different degrees of provenance shift and evaluating model robustness and performance; (2) several effective adjustment methods to build more robust models. The framework and adjustment methods were tested on three datasets, two from the biomedical domain and one from the general domain, to validate their generalizability. Results indicate that the methods, focusing on different aspects of the modeling procedure, can help improve model robustness, and that model performance can also be improved when provenance shift is extreme. This work contributes to our understanding of how provenance shift impacts model performance, and provides methods to develop more robust models that can withstand the challenges posed by such shifts, ultimately leading to algorithms that are more reliable and trustworthy, and less biased.
Surgical Site Infection (SSI) Identification Across Multiple Facilities and Surgery Types Using Multimodal Data and Deep Learning
(2025-01-23) Chakraborty, Arjun; Yetisgen, Meliha
Surgical site infections (SSI), infections at the surgical site that occur after surgery, impact more than a hundred thousand patients a year in the United States. They increase the risk of death after surgery, lead to complications like cellulitis and sepsis, and incur significant healthcare costs. Surveillance of SSI can guide interventions to reduce SSI rates. The current mainstay of SSI surveillance is manual chart review, which is expensive and time consuming. Automated surveillance systems addressing these drawbacks typically rely on a limited number of data modalities from the electronic health record (EHR). They predominantly use rule-based approaches or conventional machine learning algorithms to retrospectively predict whether a surgical case resulted in an SSI. This limits the performance and domain adaptation capability of published gold standard automated surveillance systems. In contrast to previous state-of-the-art automated surveillance approaches, we employed a data-driven deep learning framework that integrated structured data, clinical text data, and temporal information from the EHRs of surgical cases to develop an automated surveillance system. Our primary findings demonstrated several key points: a purely data-driven deep learning approach using multimodal data outperformed previously published gold standard rule-based and conventional machine learning approaches for the task of surgical site infection (SSI) prediction; the data representation and modeling strategies we utilized enabled the construction of models capable of domain adaptation across a diverse set of domains; and large language models (LLMs), specifically generalist foundation models such as Llama 3, offered previously unrealized performance gains.
Use of the Electronic Health Records to facilitate phenotyping, comorbidity analysis, and genomics
(2025-01-23) Xian, Su; Tarczy-Hornoch, Peter
Since the wide adoption of electronic health records (EHR) in 2010, many topics regarding the secondary use of the EHR received attention. The secondary use of EHR usually indicates repurposing the EHR data for research use, including information extraction, phenotyping, disease surveillance and forecasting, and policy making. Within this context, we ask how to use the EHR data to study the disease of interest, especially identifying new knowledge. In this work, we explored the secondary use of EHR from both unsupervised and supervised methods, exploring the potential of utilizing the EHR data to identify novel disease patterns and investigate disease etiology. In aim 1, we present an unsupervised approach for embedding high-dimensional EHR data at the patient level to help characterize patients and identify new disease patterns. Inspired by the modern language model architecture - transformers, with the attention mechanism - we use patient diagnosis and procedure codes as vocabularies and treat each patient as a sentence to perform the patient embedding. Using 34,851 medical codes for 1,046,649 longitudinal patient events, we performed embedding for 102,739 patients in the electronic MEdical Records and GEnomics (eMERGE) Network. In aim 2, we illustrated several downstream task applications of the patient embedding, especially providing insights into comorbidity patterns and the progressional trajectory of individual patients within certain diseases of interest. We demonstrated excellent performance in the prediction of future disease events (median AUROC = 0.87, one year within the future), and bulk-phenotyping (median AUROC = 0.84). More importantly, we illustrated the use of patient vectors to reveal heterogeneity comorbidity patterns (disease subtypes) within a defined phenotype and captured their disease trajectory longitudinally. Our model is externally validated using the EHR dataset from the University of Washington, showing robustness and stable performance. These results paved the way for using representation learning in the EHR to characterize patients with certain diseases of interest and associated clinical outcomes that can promote disease forecasting performances and facilitate personalized medicine. In Aim 3, we utilized an EHR-derived and validated rule-based phenotyping algorithm to establish the cohort for identifying genetic risk factors for depression. We illustrated the application of genomic study using this EHR-derived algorithm to facilitate the study of disease etiology using genetics. We took a complex psychiatric disease -- depression, a leading cause of disability -- as an example, to study the genetic predisposition using data from the EHR. Large-scale genomic studies have identified common variants associated with depression. However, the complexity of the depression phenotype caused its suffering from inconsistent cohort definition and limited sample sizes. There is a need for a validated, automated EHR phenotyping algorithm that can accurately identify depression in the clinic. Here, we implemented a validated EHR phenotyping algorithm to construct a depression cohort (11,532 cases and 39,631 controls, total n = 51,163) and conducted a genome-wide association study (GWAS) using this cohort. Our study reproduced previously identified genetic associations (PHF5A, KCNG2) with depression susceptibility. We also identified novel SNPs falling into the HLA region and the IGVH region, indicating an association between the immune function and depression phenotype. In addition, we also demonstrated the robustness of our phenotyping algorithm through genetic correlation analysis, using a large meta-analysis of major depressive disorder as a standard. Together, this work served as a non-exhaustive but powerful demonstration of the use of the EHR data both in a supervised and unsupervised manner, to facilitate many downstream clinical applications, including phenotyping, comorbidity analysis, and genomics.
Reporting Understandable, Useful, and Trustworthy Results of Clinical Prediction Model Studies: Insights from Biomedical Researchers
(2024-10-16) Rahmatullah, Ivan; Hartzler, Andrea
Despite the increasing number of clinical prediction model (CPM) studies, the quality of reporting, especially for preimpact analysis studies focusing on developing and validating CPMs in research papers, remains subpar. This poor reporting quality hinders the progression of CPM studies by impeding follow-up studies, such as external validation, impact analysis studies, and systematic reviews. While the reporting guideline for these studies, TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis), emphasizes transparency, biomedical researchers advocate for CPM study results to additionally embody three quality attributes: understandable, useful, and trustworthy. Yet, the extent to which biomedical researchers perceive and ensure that CPM study results meet these quality attributes, has not been exploredThis dissertation aims to bridge these gaps by identifying challenges, needs, and visualization preferences among biomedical researchers to ensure CPM study results meet these three quality attributes. Each main chapter in this dissertation addresses a specific aim. Aim 1, presented in Chapter 4, uses a mixed-method survey to explore biomedical researchers’ challenges in ensuring that CPM study results meet the three quality attributes as authors and reviewers. Aim 2, detailed in Chapter 5, involves interviews with biomedical researchers to characterize their needs to ensure the three quality attributes in CPM study results. Aim 3, outlined in Chapter 6, based on interviews with biomedical researchers, identifies visualization preferences that could enhance the quality of CPM study results. The concluding Chapter 7 summarizes these findings and their contributions to biomedical informatics, which highlight a novel approach to improve the quality of CPM study results by focusing on the three quality attributes and engaging biomedical researchers beyond traditional expert panels. Furthermore, the dissertation includes foundational chapters setting the research stage. Chapter 1 reviews relevant prior work and outlines my motivations for this study, rooted in my experiences as a primary care clinician and biomedical researcher. Chapter 2 reports on my preliminary work through a primary care provider survey about their use of clinical prediction rules. Chapter 3 describes recruitment strategies that enhanced biomedical researchers' participation in the Chapter 4 survey, utilizing PubMed records for expanded outreach.
Deep Learning Classification of Spinal Osteoporotic Compression Fractures on Radiographs
(2024-10-16) Dong, Qifei; Luo, Gang
Although osteoporosis is a debilitating disease that affects 9% of individuals over 50 years of age in the US and 200 million women globally, osteoporosis screening is underutilized. A complementary approach to osteoporosis screening is opportunistic screening using pre-existing images to detect spinal osteoporotic compression fractures (OCFs). Spinal OCFs are often incidental findings and under-reported. An automated opportunistic screening tool can ensure earlier diagnosis and treatment of spinal OCFs and osteoporosis. A crucial component for the automated opportunistic screening tool is an OCF classifier that detects OCF on each vertebral body. In this research, we focus on building this OCF classifier. To do this, two spine radiograph datasets were obtained, whose radiographs are in the Digital Imaging and Communications in Medicine (DICOM) format. To annotate the data, we designed DicomAnnotator, a configurable open-source software program for efficient DICOM image annotation. With the annotated radiographs, we used five deep learning algorithms to build the OCF classifier. Training a deep learning model on a large dataset is often time-consuming. During deep learning model training, it is desirable to offer a non-trivial progress indicator that can continuously project the remaining model training time and the fraction of model training work completed. This makes the deep learning model training process more user-friendly. We designed the first set of techniques to support progress indication for deep learning model training that allows early stopping. In summary, we realized the following three aims in this research: 1) Aim 1: Design DicomAnnotator. Usability evaluation shows that DicomAnnotator is easy to learn, is efficient to use, and allows annotators to quickly make several types of annotations on a large set of DICOM images. 2) Aim 2: Build the OCF classifier. Model evaluation results show that our OCF classifier has some generalizability to clinical data and a suitable performance for our future opportunistic osteoporosis screening. 3) Aim 3: Design progress indication methods for deep learning model training. Our experiments show that our progress indicator can offer useful information even if the run-time system load varies over time and can self-correct its initial estimation errors, if any, over time.
A Longitudinal Study Based on Secondary Usage of Electronic Health Record for Identification of Erectile Dysfunction (ED) Risk Factors and Identification of Patients
(2024-10-16) TIANRAN, LI; Tarczy-Hornoch, Peter
ED affects one in five men in the United States, and its prevalence increases with age. Recognizing ED’s chronic nature and the need for comprehensive, long-term healthcare documentation with multi-factor analysis, our study utilized integrated and enriched electronic health records (EHR) from the Electronic MEdical Records and Genomics (eMERGE) cohort 3 at the Kaiser Permanente/Washington University site. We developed a novel method for identifying ED cases from multiple sources to classify individuals with ED. We then conducted inferential analysis using logistic regression and Cox proportional hazard regression models, with longitudinal trajectory analysis for ED. Our study provides new insights into disease pathogenesis, enables better clinical management, and ultimately aims to improve the quality of life and healthcare outcomes for affected individuals. The utilization of integrated EHR-based informatics holds great promise for accelerating the understanding and management of complex chronic conditions like ED.
Robust Methods for Clinical Text Classification and Disease Understanding with NLP Extracted Symptoms from Clinical Notes
(2024-10-16) Zhou, Weipeng; Yetisgen, Meliha
Electronic Health Records (EHR) contain comprehensive medical and treatment histories of patients and have the potential to be used to provide better healthcare. A significant portion of the EHR is in the form of clinical notes and Natural Language Processing (NLP) methods can help extract hidden information from them. However, applying NLP in healthcare has challenges. Many of the clinical note datasets are scarce and imbalanced, making it difficult to develop generalizable and robust NLP methods. Additionally, effective use of NLP in healthcare requires close collaboration with medical experts to identify and understand meaningful clinical problems. This dissertation addresses these challenges and explores the application of NLP in healthcare. In Chapter 3 and 4, we develop generalizable and robust NLP methods for clinical note classification and female suicide report coding. In Chapter 5 and 6, we apply NLP to extract symptoms from clinical notes and study risk factors associated with out-of-hospital cardiac arrest (OHCA) and Long COVID.
Making Health Knowledge Accessible Through Personalized Language Processing
(2024-09-09) Guo, Yue; Cohen, Trevor T.C.
The 2019 COVID pandemic exposed the difficulties the general public faces when attempting to use scientific information to guide their health-related decisions. Though widely available in scientific papers, the information required to guide these decisions is often not accessible: medical jargon, scientific writing styles, and insufficient background explanations make this information opaque to non-experts. Consequently, there is a pressing need to deliver scientific knowledge in lay language, which has motivated my research on automated plain language summary generation to make health information more accessible. The main challenges addressed in this thesis are limited data, generating background knowledge, lack of evaluation metrics, and the need for personalization. To tackle the limited data challenge, I introduce the task of automated generation of plain language summaries (PLSs) of biomedical scientific reviews and construct the Corpus for Enhancement of Lay Language Synthesis (CELLS), the largest and most diverse dataset for PLS in the medical domain. For generating background knowledge, I explore methods for Retrieval-Augmented Lay Language (RALL) generation, augmenting state-of-the-art text generation models with information retrieval from various sources. A key part of this process has been evaluating existing metrics to see if they effectively measure performance for this task, and considering if there might be better options. To address the lack of evaluation metrics, I present APPLS, the first granular testbed for analyzing evaluation metric performance for PLS, and introduce POMME, a new metric that employs language model perplexity to assess text simplicity. Finally, I broaden the discussion beyond health information - exploring how we can personalize and improve communication across different domains. Grounded in the real-world setting of interdisciplinary reading, this research offers insights into features and methods for the novel task of integrating personal data into scientific jargon identification. In conclusion, my thesis provides a comprehensive approach to making biomedical literature more accessible and understandable for health consumers by addressing key challenges in developing automated PLS generation systems. The contributions span data collection, method development, evaluation metric design, and personalization, paving the way for more effective communication of health information to the general public.
Relationship between Brain Metabolism and Injury in Post-Cardiac Arrest Comatose Patients
(2024-09-09) Chen, Kevin; Whipple, Mark
Cardiac arrest is a leading cause of death in the United States contributing to 5.6% of annual deaths. More than 80% of survivors are in permanent coma and 50-80% of those will die. Despite advancements in cardiac care, the mechanisms underlying brain injury are complex and not well understood, which has limited advancement in brain targeted therapies. Specifically, the relationship between regional brain metabolism and risk of brain injury is not known. This thesis investigated the relationship between brain injury in post-cardiac arrest comatose patients and metabolic characteristics: cerebral blood flow (CBF), cerebral blood volume (CBV), cerebral metabolic rate of oxygen (CMRO2), and cerebral metabolic rate of glucose (CMRglu) in healthy normal brains. The study analyzed whole brains, brains clustered by injury percentages, and brain regions clustered by injury percentage. Resulting correlations showed that CMRO2 and CMRglu had stronger correlations with brain injury than CBF and CBV, indicating a closer link between oxygen and glucose utilization and brain damage. Patients with minimal injury exhibited weak correlations, while patients with moderate to severe injuries displayed stronger correlations, emphasizing the critical role of oxygen and glucose metabolism in brain damage progression.
Assessing Disparities Through Missing Race and Ethnicity Data: Results from a Juvenile Arthritis Registry
(2024-09-09) Banschbach, Katelyn; Tarczy-Hornoch, Peter
Ensuring high quality race and ethnicity data within the electronic health record (EHR) and across linked systems, such as patient registries, is necessary to achieve a goal of inclusion of racial and ethnic minorities in scientific research and detect disparities associated with race and ethnicity. The project goal was to improve race and ethnicity data completion within the Pediatric Rheumatology Care Outcomes Improvement Network (PR-COIN) and assess impact of improved data completion on conclusions drawn from the registry. The project consisted of 5 parts: (1) Identifying baseline missing race and ethnicity data, (2) REDCap survey of current collection and entry, (3) Data completion through audit and feedback cycles, (4) Assessment of impact on outcome measures, and (5) Participant interviews and thematic analysis. REDCap survey (Supplementary Materials A) and participant interviews (Supplementary Materials B) are available in the supplementary materials. Across 6 participating centers, 29% of patients were missing race and 31% were missing ethnicity, with most patients missing both. Rates of missingness varied by data entry method (electronic vs manual). Recovered data had a higher percentage of patients with Other race or Hispanic/Latino ethnicity compared to patients with non-missing race and ethnicity at baseline. Black patients had a significantly higher odds ratio of having a clinical juvenile arthritis disease activity score (cJADAS10) of ≥5 at first follow up compared to White patients. There was no significant change in odds of cJADAS10 ≥5 for race and ethnicity after data completion. Patients missing race and ethnicity were more likely to be missing cJADAS values which may affect the ability to detect changes in odds of cJADAS ≥5 after completion. About 1/3 of patients in a pediatric rheumatology registry were missing race and ethnicity data. After three audit and feedback cycles, centers decreased missing data by 94%, primarily via data recovery from the EHR. In this sample, completion of missing data did not change the findings related to differential outcomes by race. Recovered data was not uniformly distributed compared to those with non-missing race and ethnicity at baseline, suggesting that differences in outcomes after completing race and ethnicity data may be seen with larger sample sizes.
Transformative Diagnostics: Applying Transformer Networks and Semantic Guidance to Whole Slide Images
(2024-09-09) Wu, Wenjun; Shapiro, Linda LS
This dissertation advances the field of digital pathology by introducing innovative deep learning approaches to improve the analysis and diagnosis of skin and breast cancers from whole slide images (WSIs). Given the complexity and variability inherent in WSIs, tradi- tional diagnostic methods often struggle with accuracy and efficiency. This work addresses these challenges through a series of projects leveraging advanced segmentation techniques, transformer-based models, and a novel Semantics-Aware Attention Guidance (SAG) framework.The initial focus of the research is on enhancing the detection and segmentation of diag- nostically significant structures within WSIs. The introduction of VSGD-Net and a two-stage segmentation approach demonstrates significant improvements in identifying melanocytes and other critical features with minimal reliance on extensive annotated data. Building on this foundation, the dissertation explores the application of transformer networks, such as HATNet and ScATNet, utilizing self-attention mechanisms to effectively learn contextual relationships across different scales in WSIs. The culmination of this research is the development of the SAG framework, which integrates semantic information into the diagnostic process, guiding attention mechanisms to focus on areas of potential malignancy. This approach not only enhances the accuracy and precision of the models, but also improves their interpretability, a critical factor in clinical settings. Empirical evaluations across multiple cancer datasets demonstrate that the proposed methods outperform existing state-of-the-art models in terms of diagnostic accuracy, robustness, and efficiency. These advancements hold significant promise for transforming cancer diagnosis, providing pathologists with powerful tools to enhance decision-making and potentially improve patient outcomes. By bridging the gap between computational models and clinical applications, this dissertation contributes to the broader goal of utilizing artificial intelligence in medicine to facilitate early detection, accurate diagnosis, and personalized treatment of cancer.
Explainable query generation for cohort discovery and biomedical reasoning using natural language
(2023-09-27) Dobbins, Nicholas J; Yetisgen, Meliha
Clinical trials serve a critical role in the generation of medical evidence and enablingbiomedical research. In order to identify potential participants, investigators publish eligibility criteria, such as past history of certain conditions, treatments, or laboratory tests. Patients meeting a trial’s eligibility criteria are considered potential candidates for recruitment. Recruitment of participants remains, however, a major barrier to successful trial completion, and manual chart review of hundreds or thousands of patients to determine a candidate pool can be prohibitively labor- and time-intensive. At the same time, the amount and variety of data contained in Electronic Health Records(EHRs) is increasing dramatically, creating both challenges and opportunities for patient recruitment. While more granular and potentially useful data are captured and stored in EHRs now than in the past, the process of accessing and leveraging these data often requires technical expertise and extensive knowledge of biomedical terminologies and data models. This thesis focuses on the development of an integrated system for identifying patients in clinical databases using a natural language interface. Humans use natural language nearly effortlessly, and thus automated means of leveraging natural language to identify patients in databases hold great potential in time and cost savings. The primary contributions of this work include a novel database schema annotation and mapping method enabling data model agnostic query generation, a method for generating intermediate logical representations of eligibility criteria, exploration of dynamic reasoning upon non-specific criteria, and development of an integrated graph-based knowledge base of biomedical concepts. This work also introduces two new annotated corpora, the Leaf Clinical Trials (LCT) corpus and Leaf Logical Forms (LLF) corpus. The LCT corpus is unique in the granularity with which it represents complex eligibility criteria, while the LLF corpus is the most extensive annotated corpus of eligibility criteria logical representations at the time of this writing. Both corpora are valuable contributions to the biomedical informatics and natural language processing communities. To evaluate the viability of our methods, both our system and a human database programmergenerated queries to identify patients eligible for 8 past clinical trials at our institution. We then compared actual participant enrollments to those found eligible. We demonstrate that our system rivals and sometimes surpasses an experienced human programmer in finding eligible patients. We finally developed a novel user interface for enabling real-time interactive cohort discovery.
Design and Development of an Intelligent Moderator Dashboard for an Online Support Community for Aging-Related Experiences
(2023-09-27) Wong, Sharon Hsien-Lin; Chen, Annie T
Medical conditions and other experiences related to aging can be challenging to manage for older adults and their caregivers. Virtual Online Communities for Aging Life Experiences (VOCALE) is an online community-based digital health intervention that aims to encourage problem-solving skills amongst older adults and caregivers through participation in weekly discussions. The VOCALE intervention is overseen by trained members of the VOCALE research team, known as “moderators”, whose responsibilities include monitoring the discussion platform and responding to the needs of participants. However, there are still unmet needs amongst VOCALE moderators, such as a desire to facilitate the intervention more effectively while also gaining a deeper understanding of how well participants are engaging with the study.This thesis project proposes a preliminary design for an intelligent moderator dashboard, which is a tool that can assist VOCALE moderators with their study-related duties and provide insights about participant engagement with the VOCALE intervention. To inform the design of this tool, a series of user-centered design activities, such as workshops and interviews, were conducted with current and former VOCALE moderators. Leveraging a combination of inductive coding and thematic analysis, the major themes extracted from these sessions were then used to inform the proceeding phase of the iterative design process. The final output of this project was a prototype for a proposed moderator dashboard design, alongside recommendations for further development. Beyond its primary purpose of providing a framework for a dashboard that can be used for future iterations of the VOCALE intervention, this work illuminates key insights about the operation of VOCALE itself, identifying the needs of various stakeholders while also highlighting areas in which the VOCALE intervention could be improved. Furthermore, the work from this project contributes to a broader understanding of how to design tools to aid with the management of online health-related discussion-based communities, with consideration placed on how individual user needs interact with overarching objectives to inform the experience of those who moderate online digital health interventions.
Using user-centered design to unburden genetic analyses for novice genomic researchers
(2023-08-14) Patel, Harsh Vijaykumar; Crosslin, David R
Increasingly larger genomic databases have allowed for more robust genetic analyses,leading to advances in bioinformatics, translational medicine, and, ultimately, improving patient care. However, the current landscape of genetic analysis software is riddled with unintuitive and inaccessible tools and software packages. These tools often lack proper documentation, need extensive setup, fail to communicate with each other, and require painstaking debugging for even simple exploratory analyses. This creates large barriers of entry for novice genomic researchers (NGRs), individuals who are interested in conducting genetic experiments but either lack the computational experience/biological background or do not have access to extensive technological resources, such as local computational clusters. Historically, very little work has been done to address the needs of NGRs, leading to an overlooked, but keystone user base that lacks proper foundational support needed to best begin their informatics journey. User-centered design (UCD) is one solution to this problem that has been under-utilized in bioinformatics software development. In this work, we sought to better characterize the NGR user base and to apply the UCD framework during the development of a more usable bioinformatics software tool. To achieve this, we first explored the existing landscape of bioinformatics software tools via a literature review and sought to create a rubric that can be utilized to evaluate the usability of those tools within the context of NGRs. To further inform the creation of this rubric, we also performed a needs assessment of NGRs utilizing semi-structured interviews. From these two sources of knowledge, we found that the key attributes that resulted in poor adoption and sustained use of most bioinformatics tools included poor documentation, lack of context-specific instructional content, difficulty in installation and setup, and uninformative error messages (Aim 1). We then created user personas to help better characterize specific types of users and utilized those personas to help design a cloud-agnostic, user-friendly GWAS analysis tool (UF-GWAS). UF-GWAS utilized a Docker container to neatly package a JupyterLab instance which allowed users to run GWAS analyses quickly and easily (Aim 2). Next, we evaluated the usability of UF-GWAS by recruiting NGRs who performed task-based evaluations. We also tested the efficiency, accuracy, and cost of UF-GWAS against industry standard software. NGRs reported UF-GWAS as highly-usable and appreciated the following key components: clarity of the documentation, quick access to relevant background knowledge, ease of onboarding, and the shareability and reproducibility of results (Aim 3). Finally, we combined the many knowledge sources throughout this study to create a set of guidelines that future researchers can follow in order to create more usable informatics software. As NGRs and other researchers begin to enter the informatics landscape, it will become increasingly important to as informaticians to create more usable analysis software. By doing so, we can encourage robust experiments from a more diverse workforce, hopefully leading to an improvement in quality of care.
A novel translational bioinformatics pipeline to improve precision medicine research
(2023-08-14) Green, Richard; Tarczy-Hornoch, Peter; Gale, Michael
Diverse Mouse models can serve as precursors to precision medicine in clinical practice (Li & Auwerx, 2020) but requires the integration, analysis, and cross-species interpretation across multi-omics data sets. We present a multi-omics pipeline designed to identify biomarkers with translational applicability using the Collaborative Cross (CC) mouse model. The CC project is a mouse genetic reference panel (GRP) that seeks to determine genetic markers driving outcomes. The CC was designed to introduce genetic diversity (like in a human population) into mouse models. Our approach comprises three overarching aims (Aim 1) Construct Networks and Linear Models in Mice. (Aim 2) Detect Genetic Drivers and Candidate Genes. (Aim 3) Verify Clinical Correlations and Biomarker Detection in Humans, which we applied our pipeline to our driving biological project (DBP) to identify markers of neuroinvasion during West Nile virus (WNV) infection. Aim 1 produced three novel immune networks (A-C) in the CC mouse model of West Nile virus infection. Network A was enriched in pattern recognition, innate immunity, and cell differentiation. Network B contained interferon and inflammation, and C was enriched for interferon signaling and neutrophil degranulation. Regression modeling and pathway analysis are also performed and identify unique immune regulators of disease outcomes across different CC strains. Using public data sets, we correlated novel gene-to-gene connections using an innovative approach, Integrated Transcriptomics Analysis (ITA). In Aim 2, using the CC mouse model of WNV infection, genetic regions were correlated to the DBP through Quantitative Trait Loci analysis (QTL) which is a statistical approach that uses genotype data (genetic markers) and phenotype (viral detection, IFITM1 expression). The purpose of a QTL is to explain if there is any basis for genetic variation in the complex traits of our phenotype. QTL analysis identified three regions 59-80Mb in chromosome 4, 107-110.5Mb in chromosome 12, and 57.1-94.5 Mb in the X chromosome. Using viral load as a phenotype, identified areas in chromosomes 4 and 12. IFITM1 as a phenotypic marker identified a QTL in chromosome X. Transcriptional analysis from Aim 1 paired with Aim 2’s QTLs identified Toll-Like Receptor 4 (TLR4) in chromosome 4, Tryptophanyl-tRNA synthetase WARS in chromosome 12, and Membrane palmitoylated protein (MPP1) in chromosome X. In Aim 3, translating findings from the CC model of WNV infection into human correlates, genetic regions from Aim 2 were converted to human genomic coordinates, and a Phenome Wide Association Study (PheWAS) using the Electronic Medical Records and Genomics (eMERGE) network (25k and 109k human genotyped participants) was performed. A PheWAS is a statistical test that uses genetic loci (or variants) and queries across a curated dataset of phenotypes defined by clinical codes. The result is genetic regions that are enriched by clinical phenotypes. PheWAS identified various clinical associations with the genetic regions identified in the CC mouse model and mapped to human genomic coordinates, including essential tremor, Type 2 diabetes with neurological manifestations, chronic kidney disease, intestinal infection due to Clostridium difficile, end-stage renal failure, and other similar clinical phenotypes. Other clinical associations were identified in genes TLR4 and TRIM32, including codes for the circulatory system, dermatologic, endocrine, hematopoietic, infectious diseases, and neoplasms. To augment the PheWAS, Bulk RNAseq was also performed on four human brains (two WNV infected, two mocks). Several target genes (Tnfsf8, PTBP3, Akna, and TLR4) identified as chromosome 4 were also significant in WNV-infected human brains. WARS gene in chromosome 12 and MMP1 In chromosome X were also identified. The transcriptional analysis also revealed which brain sections contained the activated QTL-derived genes. TLR4 was significant in the Basal Ganglia. Akna was significant in the Cortex. PTBP3 was significant in the Basal Ganglia, Cortex, and Thalamus. In chromosome 12, the Wars gene was significant in the Basal Ganglia, Cortex, and Thalamus. MPP1 and MCFS appeared statistically significant in chromosome X in the Basal Ganglia. Our pipeline leveraged a diverse mouse model to calculate genetic and transcriptional markers associated with disease phenotypes. Results from Aim 1 validated previously identified transcription factors (ATF4, SMAD) in mice from WNV studies. Novel immune networks containing various target genes were identified: CCL5, NFATC1 CD53, MSN, PTPN6, and RAC2. Integrated Transcriptomics Analysis (ITA) found new correlations using public data, including Ptpn6 correlated with Cd35, Cd37, and Irf8. ArchS4 results determined Dtx3l was correlated with Ifih1, Samd9, and Oas3. These known antiviral genes infer Dtx3l’s association with neuroinvasion and immune response genes. Functional analysis revealed that IL-17 and CD40 are uniquely activated regulators of asymptomatic strains. SIRT1 and STAT1 transcriptional regulators were unique to protected strains. Aim 1 results provided a range of candidate genes for further validation. QTL analysis from Aim 2 identified genomic regions correlated with neuroinvasion (WNV detection by qPCR) and IFITM1. The QTLs showed specific founder effects, and further experimentation is required to determine specific allelic effects in key genes. Based on the transcriptional results, significant genes within the QTL included Susd1, Slc31a2, Tlr4, Ptbp3, Akna, Wars, Yy1, and Trim32. Genes varied in significance by strain and time point. QTL analysis provided sufficient candidate regions for downstream analysis using human data. PheWAS results from Aim3 identified variant rs111245230 identified with SVEP1, an inflammatory gene. Focusing on key genes (TLR4 and TRIM32) and their variants identified ICD codes for hypertensive heart and renal diseases. Renal diseases are a common outcome of chronic WNV infection. Target genes validated in WNV-infected human brains through transcription analysis determined Tnfsf8, PTBP3, Akna, and TLR4 in chromosome 4, WARS in 12, MMP1 and MCF2 in X. TLR4 was significant in the Basal Ganglia. Akna in the Cortex. PTBP3 in the Basal Ganglia, Cortex, and Thalamus. WARS in the Basal Ganglia, Cortex, and Thalamus. MPP1 and MCFS were significant in the Basal Ganglia. Connecting the results and our findings across our aims revealed distinct connections and biomarkers to be used in precision medicine applications.

Browse

Recent Submissions