Digital Pathology: Diagnostic Errors, Viewing Behavior and Image Characteristics
MetadataShow full item record
Whole slide imaging technologies provide a unique opportunity to collect and analyze large amounts of data on pathologists' interactions with the digital slide. In this work, we are studying the underlying causes of diagnostic errors in histopathology. Instead of focusing on the detection of invasive cancer, we consider the full-spectrum of diagnoses that a pathologist encounters during clinical practice and aim to study misidentification and misinterpretation errors that may cause overdiagnoses or underdiagnoses. To this end, we use the digiPATH dataset that consists of 240 breast biopsies with diagnoses ranging from benign to invasive cancer, the actions of pathologists recorded during their interpretations of the slides and the diagnostic regions associated with the final diagnoses they assigned. Our work consists of three parts: region of interest localization, diagnostic classification and viewing behavior analysis. The first part of our work introduces a novel methodology to extract the diagnostically relevant regions of interest from pathologists' viewing behavior, and a computer vision model to detect these regions automatically on unseen images. Region of interest (ROI) localization provides us with a set of regions on the whole slide that either leads to the correct diagnosis or distracts the pathologists. The largest portion of this thesis is devoted to the diagnostic classification problem. Starting with a tissue labeling, we developed features that describe the tissue composition of the image and the structural changes. We first introduce two models for the semantic segmentation of the regions of interest into tissue labels. Then, we define two different feature sets that are constructed from the tissue label images. The first feature set consists of superpixel-label frequency and co-occurrence histograms, which are common image features. The second set of features are a sequence of histograms that together comprise the structure feature, a new kind of image feature defined for the first time in this work. Instead of attempting a four-class classification (benign, atypia, DCIS and invasive), we classify images one diagnosis at a time starting with invasive versus benign and ending with atypia versus DCIS. We show that the superpixel-label frequency and co-occurrence histograms work best for the classification of the invasive cases while the structure feature is more suitable for the benign, atypia and DCIS cases. The final part is an analysis of the pathologists' behavior on the whole slide images. We first analyze the relationship between the identification of the correct ROI and the diagnosis. We show that the higher overlap with the consensus ROI is correlated with a higher diagnostic accuracy. Then, we introduce novel measurements of interpretative patterns and identify two strategies used by the pathologists: scanning and drilling. We demonstrate that the interpretation strategy does not change the diagnostic accuracy but drilling is the more efficient option. Although it does not affect the diagnostic outcome, the interpretation strategy is correlated with the pathologists' characteristics like gender, age, experience and nervousness.