Robust Prediction and Biomarker Discovery in Rare Cancers Using Interpretable Machine Learning
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Rare cancers such as Glioblastoma Multiforme (GBM, a rare brain cancer) pose persistent challenges in computational oncology due to limited data, biological noise, and difficulty in isolating disease-specific molecular signatures. Based on these constraints, this work began with the expectation that rare-cancer models would perform poorly. However, machine learning approaches on genomic data achieved unexpectedly strong accuracy, motivating investigation into whether this separability reflected genuine biology or artifactual signal. This thesis develops an interpretable machine learning framework that evaluates predictive robustness and isolates biologically meaningful biomarkers under extreme imbalance. Cascade Learning systematically removes broad cancer pathways and reveals biomarkers uniquely associated with the rare cancer, while SHAP-based interpretability aligns these genes with experimentally reported glioma biology. Complementary Tab2Image visualizations provide spatial confirmation of class separability, strengthening biological trust in the learned signal. Overall, this work provides a robust, biologically grounded, and ethically aligned pathway for rare-cancer biomarker discovery that emphasizes transparency, fairness, and accountability in scarce-data environments.
Description
Thesis (Master's)--University of Washington, 2025
