Understanding the differences in cognitively defined subgroups in Alzheimer's disease: A data science approach

dc.contributor.advisorGennari, John H
dc.contributor.authorSohi, Harkirat Kaur
dc.date.accessioned2023-01-21T05:00:52Z
dc.date.issued2023-01-21
dc.date.submitted2022
dc.descriptionThesis (Ph.D.)--University of Washington, 2022
dc.description.abstractMy work connects two types of data in Alzheimer’s Disease (AD): structural MRI data from Alzheimer’s Disease Neuroimaging Initiative (ADNI) and cognition data in the form of AD subgroups. The subgroups (AD-Executive, AD-Language, AD-Memory and AD-Visuospatial), defined by Crane et al. (2017), are based on cognitive test scores from the time of AD diagnosis, and each subgroup is characterized by marked impairment in the specified cognitive domain relative to the other domains. My dissertation’s focus is on data science and mathematical methods to understand how volumes of 70 brain regions of interest (ROIs) might differ across pairs of AD subgroups in cross-sectional data in time, specifically data from the time of AD diagnosis (Aim 1) and in longitudinal data (Aim 2). My work demonstrates a careful assessment and implementation of methods to best utilize the data available that is currently small in sample size, with imbalanced AD subgroup sizes and noisy in nature. In both aims, the following pairs of AD subgroups were compared: a.) AD-Language vs. AD-Memory, b.) AD-Memory vs. AD-Visuospatial and c.) AD-Language vs. AD-Visuospatial. The AD-Executive group was excluded from the current analyses due to its small sample size. In Aim 1, I explored supervised machine learning classification methods that provide insight into variable importance for identifying the most important brain ROIs for distinguishing between pairs of AD subgroups. I determined random forest to be the most appropriate method for this task, given the characteristics of the data. Prior to building classification models, I addressed specific challenges in cross-sectional data: potential noise due to non-ROI variables and imbalanced AD subgroup sizes. A challenge in using classification models in the domain of AD subgroups is that there is no gold standard for knowing how separable the AD subgroups are based on ROI volumes. The work presented here may be the first to establish a starting benchmark for classification accuracies for distinguishing between pairs of AD subgroups based on ROI volumes, although these models are not intended to be used for prediction in a clinical setting but rather to understand which brain regions are most important to distinguish the AD subgroups. In Aim 2, I used linear mixed effects (LME) modeling on longitudinal data to determine which of the 70 ROIs’ volume trajectories differ the most across pairs of AD subgroups in terms of longitudinal volume and rate of change of volume with respect to time. First, I laid out criteria for using data from specific MRI scans in an effort to reduce noise in data, instead of using the default longitudinal dataset. Given the small sample size of the AD subgroups and irregular data, I implemented LME modeling for each ROI on the original dataset consisting of all time points and also on a series of subsets of data that were obtained by restricting each AD subgroup’s data to time points with a specific minimum number of subjects available. An important finding of my work is that there was some overlap in the top ROIs that were determined to be important based on cross-sectional and longitudinal data analyses, for distinguishing between pairs of AD subgroups. Results from my Ph.D. work have potential implications for decisions about which brain regions may be relevant for future neuropathological studies in studying AD subgroups.
dc.embargo.lift2025-01-10T05:00:52Z
dc.embargo.termsRestrict to UW for 2 years -- then make Open Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherSohi_washington_0250E_24914.pdf
dc.identifier.urihttp://hdl.handle.net/1773/49583
dc.language.isoen_US
dc.relation.haspartSupplementaryFiles.zip; pdf; Supplemental plots for Chapters 2, 3 and 5.
dc.rightsnone
dc.subjectAlzheimer's disease
dc.subjectbrain volume data
dc.subjectComputational biology
dc.subjectData Science
dc.subjectmachine learning application
dc.subjectmathematical methods
dc.subjectBioinformatics
dc.subject.other
dc.titleUnderstanding the differences in cognitively defined subgroups in Alzheimer's disease: A data science approach
dc.typeThesis

Files

Original bundle

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
Sohi_washington_0250E_24914.pdf
Size:
5.8 MB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
SupplementaryFiles.zip
Size:
42 MB
Format:
Unknown data format