Development of an LLM Framework for Clinical Hypothesis Testing using Multimodal Data

Gim, Nayoon

Development of an LLM Framework for Clinical Hypothesis Testing using Multimodal Data

Files

Gim_washington_0250E_29261.pdf (5.19 MB)

Date

2026-04-20

relationships.isAuthorOf

Gim, Nayoon

Abstract

Electronic Health Records (EHRs) contain rapidly expanding volumes of structured clinical data with thepotential to accelerate evidence generation. However, translating clinical hypotheses into reproducible research remains a slow and resource-intensive process requiring manual cohort definition, data harmonization, and statistical coding. These labor-intensive steps limit scalability and transparency, contributing to challenges in reproducibility and auditability. This thesis investigates how large language model (LLM)-assisted workflows, combined with data standardization and privacy-preserving system design, can transform clinical research into a more scalable and transparent process. Chapter 1 introduces the motivation for LLM-assisted scientific workflows, reviews standards and interoperability challenges in health data, and defines the scope of the thesis. Chapter 2 establishes the data foundations that enable automated LLM-assisted clinical research by addressing two complementary requirements: data standardization and secure LLM interaction with health records. Clinical datasets are often fragmented and inconsistently structured, leading to dataset- specific analytic code that limits scalable automation. We address this by standardizing retinal imaging data in the AI-READI (Artificial Intelligence Ready and Exploratory Atlas for Diabetes Insights) cohort using structured DICOM (Digital Imaging and Communications in Medicine) representations. Building on this standardized structure, we develop a metadata preparation workflow that enables LLM-assisted analysis without exposing patient-level data. By aggregating schema information and natural language representations of data elements, this workflow provides the contextual information required for LLMs to generate executable analytical code without accessing any patient-level data. These approaches are demonstrated using two datasets: AI-READI and NHANES (National Health and Nutrition Examination Survey). To understand what aspects of clinical research can be effectively automated, it is first necessary to analyze existing manual workflows. Chapter 3 begins with a case study investigating the relationship between post-intraocular pressure elevation and the development of primary open-angle glaucoma using the IRIS Registry (Intelligent Research in Sight). This study was carried out using standard manual research workflows and serves as a representative example of real-world clinical research practice. Section 3.2 then builds on this work by shifting the focus from clinical outcomes to process analysis. Using the completed study from 3.1 as a reference, we examine the underlying research workflow to identify repetition inherent in manual pipelines, scalability bottlenecks, and recurring analytical steps. This secondary analysis highlights concrete opportunities for automation and directly motivates the development of an LLM-assisted framework to streamline hypothesis testing. Chapter 4 introduces LATCH (Large Language Model-Assisted Testing of Clinical Hypotheses), a framework that automates the translation of natural language research questions into executable statistical analyses. LATCH combines an LLM-driven semantic component that maps hypotheses to explicit cohort definitions and data extraction logic with a deterministic statistical engine that ensures reproducibility and auditability. We describe the system architecture and validate the framework by reproducing a set of published studies on diabetes using the NHANES data, demonstrating that LATCH can generate end-to- end analytical pipelines from natural language prompts without manual coding. We further characterize the system’s operational limits through targeted stress testing and behavior under edge-case conditions. Finally, Chapter 5 illustrates the application of LATCH in advancing biomedical knowledge. Beyond reproduction, LATCH enables extended analyses of existing studies, including cross-dataset generalizability testing between NHANES and AI-READI, temporal consistency evaluation, stratified analyses, and more granular exploration of prior findings. LATCH is also used to conduct exploratory, hypothesis-generating analyses of previously unexplored questions, including the identification of a nationwide vision-related trend in the diabetes population and associations between disease severity and retinal biomarkers using the AI-READI cohort. This thesis presents a framework that combines data standardization, privacy-aware infrastructure, and LLM-assisted analytics to improve the efficiency and reproducibility of clinical research. The work demonstrates that carefully designed AI-assisted systems can accelerate hypothesis testing, reduce repetitive manual effort, and support transparent real-world evidence generation while preserving human expert verification.