Developing Informatics Frameworks for Evaluating Deep Learning Algorithms for Mammography

Ramwala, Ojas Ankurbhai

Developing Informatics Frameworks for Evaluating Deep Learning Algorithms for Mammography

Files

Ramwala_washington_0250E_29124.pdf (4.56 MB)

Date

2026-02-05

Authors

Ramwala, Ojas Ankurbhai

Abstract

Deep learning algorithms have played a major role in advancing AI for mammography-based breast cancer screening. Studies have shown that AI tools can achieve, and in some cases exceed, the performance of breast imaging radiologists. Integrating deep learning algorithms into clinical workflows has the potential to streamline mammography interpretation, aid early cancer detection, and enhance risk prediction. Nevertheless, in spite of promising initial performance across a broad range of tasks in mammography interpretation and breast cancer screening, their adoption in clinical settings remains limited. A major contributing factor is the lack of methods to thoroughly evaluate the safety, reliability, clinical utility, and trustworthiness of AI models in mammography, thereby creating a critical gap between algorithm development and real-world implementation. Therefore, we have developed informatics frameworks to support a comprehensive and systematic evaluation of AI algorithms prior to their clinical adoption, enabling stakeholders to critically assess model generalizability and interpret the underlying inference mechanism that drives model predictions. AI models may have limited generalizability in new clinical settings or within specific demographic subpopulations. We developed an open-source framework, ClinValAI (Clinical Validation of AI), to support health systems in implementing a cloud-based infrastructure for rigorous external validation of AI algorithms before clinical adoption. ClinValAI enables secure, privacy-preserving external validation by protecting patient imaging data and developers’ intellectual property while offering scalable, customizable workflows that accommodate the diverse computational demands of multiple AI algorithms. We demonstrate ClinValAI’s utility by performing a large-scale external validation of multiple FDA-cleared commercial AI algorithms for breast cancer detection using mammography exams from seven U.S. regional breast cancer registries. By comparing those algorithms and evaluating their performance against radiologists’ assessments, our study highlights the benefits and risks of adopting AI tools in clinical workflows. ClinValAI provides a holistic framework for validating medical imaging models and has the potential to advance the adoption of accurate, generalizable AI models in mammography-based breast cancer screening. Even when AI algorithms demonstrate strong generalizability, their 'black box' nature obscures meaningful insights into their inference mechanism, undermining trust and transparency. We address this challenge specifically for the Mirai model for mammography-based breast cancer risk prediction. We developed a method, FOCUS (Feature-space OCclusion for Understanding Saliency), to assess the contribution of different mammogram imaging regions to Mirai’s prediction. We created interpretable visualizations, FOCUS Maps, to identify the mammogram patch that most strongly influences Mirai’s risk scores. We observe that Mirai's risk estimates are primarily driven by localized imaging features and are significantly influenced by sites where cancer was subsequently detected. FOCUS Maps may assist radiologists in localizing suspicious regions for intensive imaging evaluation, such as a focused diagnostic ultrasound, bolstering its clinical utility in potentially guiding personalized screening and early intervention strategies. Although Mirai may be detecting early signs of malignancy, we also identified other localized, non-lesion imaging patterns that drive its predictions. Our explainability technique can help radiologists assess whether clinically meaningful features are associated with AI model predictions for breast cancer risk. Overall, by developing informatics frameworks for external validation and model explainability, our work supports a comprehensive evaluation of the generalizability and trustworthiness of AI tools, potentially enabling the clinical adoption of AI tools to improve healthcare outcomes and promote health equity.