Learning Rule-based Decision-Making Systems from Heterogeneous Longitudinal Data

Shakur, Ameer HamzaLearning Rule-based Decision-Making Systems from Heterogeneous Longitudinal DataMy University2023Industrial engineeringIndustrial engineeringMy UniversityMy UniversityHuang, Shuai2023-04-172023-04-172023-04-172023en-USThesisShakur_washington_0250E_25221.pdfhttp://hdl.handle.net/1773/49924application/pdfCC BYThesis (Ph.D.)--University of Washington, 2023Recent advances in sensing technology have greatly expanded our capacities to collect data from a diverse pool of patients in unprecedented spatial-temporal resolutions and from a variety of different sources. These technological advances enormously increase the complexity of modern patient data sets and have thrown up many challenges and opportunities for analysis, as the methodological framework of the classic models designed to model the average effects are found to be over-simplified. The increasing size and dimensionality of modern data sets also make the development of sparse models imperative. There has been a recent push towards interpretable models that can not only provide accurate predictions but also explain why the prediction has been made. Interpretability and explainability hold the key to their success in complex applications such as healthcare so that the model decisions may be communicated to, and evaluated by medical professionals, and enhance accountability. A faithful understanding of the uncertainty in the predictions is becoming critical as decision-making can be dangerous and expensive in such applications especially as more and more systems are getting automated in this age of data. Larger datasets often bring increasing heterogeneity of data, so personalized decision models that can recognize heterogeneity between observations and subgroups are important in medical applications. Additionally, complex data structures such as survival data often have incomplete data that must be carefully modeled and poses their own challenges. Further, these data are often multimodal and may come in various representations such as text, audio-visual, or time series. This dissertation focuses on developing rules-based interpretable machine learning models that can address these new and exciting challenges in modern datasets. First, we introduce SURVFIT, a “doubly sparse” rule extraction formulation for survival data. This doubly sparse method can induce sparsity both in the number of rules and in the number of variables involved in the rules. Further, a systematic rule evaluation framework that includes statistical testing, decomposition analysis, and sensitivity analysis is developed to assist the interpretability of the extracted rules. Our next contribution, GPSRL, proposes a Bayesian semi-parametric ordered rule-list methodology to address the heterogeneity and quantify uncertainty. The use of ordered rule lists enables us to model the heterogeneity while keeping in check the model complexity. We apply these methodologies to real-world applications through a sepsis survival dataset. Finally, we explore the applications of the rules-based approach to the discovery of multimodal biomarkers in ADHD. We identify interesting interactions among two modalities of data — eye movement patterns and EEG signals. The detection of these interactions would help us better understand the condition and develop better prediction models and intervention strategies.