Shapiro, LindaGhezloo, Fatemeh2024-10-162024-10-162024-10-162024Ghezloo_washington_0250E_27514.pdfhttps://hdl.handle.net/1773/52464Thesis (Ph.D.)--University of Washington, 2024Whole slide imaging (WSI) has revolutionized digital pathology, yet final diagnosis still relies heavily on pathologists’ visual examination, which is often challenging due to the volume of data. Cancer mortality is significantly impacted by diagnostic errors and discordance among pathologists on the same case, which highlights the need for computer-aided diagnosis (CAD) systems to support pathologists in their clinical practices. One essential initial step in designing these systems is to understand the viewing behavior of pathologists. Prior research suggests that accurately identifying and interpreting regions of interest (ROIs) is essential for effective diagnosis. In this dissertation, we investigate the correlation between pathologists’ viewing behaviors and diagnostic accuracy using viewport-tracking data for melanocytic skin lesions. Our analysis reveals a significant correlation between time spent viewing ROIs and diagnostic accuracy. Based on these findings, we propose a novel ROI detection method that integrates pathologists’ viewing behaviors with deep learning, using an encoder-decoder architecture to predict pixel-level heatmaps of diagnostically relevant areas on WSIs. However, the scarcity and uni-modality of datasets in digital pathology limit the performance and interpretability of deep learning models. To address these issues, we introduce Quilt-1M, the largest vision-language dataset in histopathology, and Quilt- Instruct, an instruction-tuning dataset, both curated from educational YouTube videos. These resources enable the development of advanced multi-modal models: (1) QuiltNet, a CLIP-based model that excels in zero-shot and few-shot image classification and image-text retrieval, surpassing state-of-the-art performance; (2) Quilt-LLaVA, a multi-modal model with enhanced spatial localization of medical concepts and complex reasoning; and (3) a multi-modal multi-agent diagnosis system that leverages the ROI detection model and Quilt- LLaVA to navigate large WSIs, gather evidence, and make diagnoses in a manner similar to pathologists. This dissertation advances digital pathology by linking viewing behaviors to diagnostic accuracy, presenting a novel ROI detection framework, and providing large-scale datasets and multi-modal models for improved CAD systems and diagnostic workflows.application/pdfen-USCC BYArtificial IntelligenceComputer VisionDigital PathologyMedical ImagingMultiModal LearningNatural Language ProcessingArtificial intelligenceComputer scienceMedical imagingComputer science and engineeringIntegrating Human Expertise and Multi-Modal AI in Digital PathologyThesis