Development of a machine learning pipeline to analyze biological multiple particle tracking datasets
Loading...
Date
Authors
SCHIMEK, NELS
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Multiple Particle Tracking (MPT) has been demonstrated as an important tool for understanding changes to biological environments. MPT studies are capable of generating gigabytes of data across hundreds to thousands of trajectories, making MPT datasets an interesting candidate for machine learning applications. To begin understanding the scope of biological questions that can be answered by coupling MPT datasets with machine learning techniques, an end-to-end data science pipeline is developed building off of recent work in the Nance Lab and applied to three unique datasets. To begin, Principal Components Analysis is applied in order to visualize the spread and distribution of the high dimensional MPT data. Next, a boosted decision tree model, XGBoost, is applied to determine the predictable capability of each dataset, and SHAP values are used to understand model predictions and find the statistical feature driving accurate predictions. Finally, XGBoost models are trained on trajectories from specific diffusion modes to determine any increase in accuracy. Overall, the pipeline presented demonstrates the capability to provide information across multiple biological questions.
Description
Thesis (Master's)--University of Washington, 2022
