Development of an LLM Framework for Clinical Hypothesis Testing using Multimodal Data
Loading...
Date
Authors
Gim, Nayoon
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Electronic Health Records (EHRs) contain rapidly expanding volumes of structured clinical data with thepotential to accelerate evidence generation. However, translating clinical hypotheses into reproducible
research remains a slow and resource-intensive process requiring manual cohort definition, data
harmonization, and statistical coding. These labor-intensive steps limit scalability and transparency,
contributing to challenges in reproducibility and auditability. This thesis investigates how large language
model (LLM)-assisted workflows, combined with data standardization and privacy-preserving system
design, can transform clinical research into a more scalable and transparent process. Chapter 1 introduces
the motivation for LLM-assisted scientific workflows, reviews standards and interoperability challenges
in health data, and defines the scope of the thesis.
Chapter 2 establishes the data foundations that enable automated LLM-assisted clinical research by
addressing two complementary requirements: data standardization and secure LLM interaction with
health records. Clinical datasets are often fragmented and inconsistently structured, leading to dataset-
specific analytic code that limits scalable automation. We address this by standardizing retinal imaging
data in the AI-READI (Artificial Intelligence Ready and Exploratory Atlas for Diabetes Insights) cohort
using structured DICOM (Digital Imaging and Communications in Medicine) representations. Building
on this standardized structure, we develop a metadata preparation workflow that enables LLM-assisted
analysis without exposing patient-level data. By aggregating schema information and natural language
representations of data elements, this workflow provides the contextual information required for LLMs to
generate executable analytical code without accessing any patient-level data. These approaches are
demonstrated using two datasets: AI-READI and NHANES (National Health and Nutrition Examination
Survey).
To understand what aspects of clinical research can be effectively automated, it is first necessary to
analyze existing manual workflows. Chapter 3 begins with a case study investigating the relationship
between post-intraocular pressure elevation and the development of primary open-angle glaucoma using
the IRIS Registry (Intelligent Research in Sight). This study was carried out using standard manual
research workflows and serves as a representative example of real-world clinical research practice.
Section 3.2 then builds on this work by shifting the focus from clinical outcomes to process analysis.
Using the completed study from 3.1 as a reference, we examine the underlying research workflow to
identify repetition inherent in manual pipelines, scalability bottlenecks, and recurring analytical steps.
This secondary analysis highlights concrete opportunities for automation and directly motivates the
development of an LLM-assisted framework to streamline hypothesis testing.
Chapter 4 introduces LATCH (Large Language Model-Assisted Testing of Clinical Hypotheses), a
framework that automates the translation of natural language research questions into executable statistical
analyses. LATCH combines an LLM-driven semantic component that maps hypotheses to explicit cohort
definitions and data extraction logic with a deterministic statistical engine that ensures reproducibility and
auditability. We describe the system architecture and validate the framework by reproducing a set of
published studies on diabetes using the NHANES data, demonstrating that LATCH can generate end-to-
end analytical pipelines from natural language prompts without manual coding. We further characterize
the system’s operational limits through targeted stress testing and behavior under edge-case conditions.
Finally, Chapter 5 illustrates the application of LATCH in advancing biomedical knowledge. Beyond
reproduction, LATCH enables extended analyses of existing studies, including cross-dataset
generalizability testing between NHANES and AI-READI, temporal consistency evaluation, stratified
analyses, and more granular exploration of prior findings. LATCH is also used to conduct exploratory,
hypothesis-generating analyses of previously unexplored questions, including the identification of a
nationwide vision-related trend in the diabetes population and associations between disease severity and
retinal biomarkers using the AI-READI cohort.
This thesis presents a framework that combines data standardization, privacy-aware infrastructure, and
LLM-assisted analytics to improve the efficiency and reproducibility of clinical research. The work
demonstrates that carefully designed AI-assisted systems can accelerate hypothesis testing, reduce
repetitive manual effort, and support transparent real-world evidence generation while preserving human
expert verification.
Description
Thesis (Ph.D.)--University of Washington, 2026
