Cancer survival prediction of six cancer types with germline whole-exome sequencing data from the UK Biobank

dc.contributor.advisorLindström, Sara
dc.contributor.authorJia, Tongqiu
dc.date.accessioned2022-07-14T22:11:13Z
dc.date.issued2022-07-14
dc.date.submitted2022
dc.descriptionThesis (Master's)--University of Washington, 2022
dc.description.abstractBackground: One of the most important tasks in cancer genomics is to predict cancer prognosis based on genetic signatures. We aim to apply machine learning approaches to germline whole-exome sequencing (WES) data to predict survival for breast cancer, colorectal cancer, kidney cancer, lung cancer, lymphoma, and prostate cancer. Methods: We analyzed the UK Biobank exome sequencing data and survival status of 10,721 incident cancer cases. We annotated WES variants with Combined Annotation Dependent Depletion (CADD) and generated gene-level CADD scores, indicating the deleterious mutation impact. Using the gene-level CADD scores, we performed unsupervised feature selection using principal component analysis (PCA), independent component analysis (ICA), random project (RP), autoencoder (AE), denoising autoencoder (DAE), and variational autoencoder (VAE). For each cancer type, we trained logistic regression models to predict cancer survival status using the selected features. Results: All models on all cancer types have a prediction accuracy around 0.5. Overall, the accuracies among deep learning models were comparable to those of linear models. When comparing VAE models with varying latent space, we did not observe an increase in accuracy as the size of latent space increased. Conclusion: Across all six cancer types, the survival prediction accuracy was similar for all models, indicating that more complex deep learning methods did not improve prediction performance. The features or embeddings derived from the six tested dimension reduction methods had limited predictive ability for cancer survival.
dc.embargo.lift2024-07-03T22:11:13Z
dc.embargo.termsRestrict to UW for 2 years -- then make Open Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherJia_washington_0250O_24536.pdf
dc.identifier.urihttp://hdl.handle.net/1773/49001
dc.language.isoen_US
dc.rightsCC BY-NC-SA
dc.subjectCancer epidemiology
dc.subjectEpidemiology
dc.subject.otherEpidemiology
dc.titleCancer survival prediction of six cancer types with germline whole-exome sequencing data from the UK Biobank
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Jia_washington_0250O_24536.pdf
Size:
769.93 KB
Format:
Adobe Portable Document Format

Collections