Development of a machine learning pipeline to analyze biological multiple particle tracking datasets

Loading...
Thumbnail Image

Authors

SCHIMEK, NELS

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Multiple Particle Tracking (MPT) has been demonstrated as an important tool for understanding changes to biological environments. MPT studies are capable of generating gigabytes of data across hundreds to thousands of trajectories, making MPT datasets an interesting candidate for machine learning applications. To begin understanding the scope of biological questions that can be answered by coupling MPT datasets with machine learning techniques, an end-to-end data science pipeline is developed building off of recent work in the Nance Lab and applied to three unique datasets. To begin, Principal Components Analysis is applied in order to visualize the spread and distribution of the high dimensional MPT data. Next, a boosted decision tree model, XGBoost, is applied to determine the predictable capability of each dataset, and SHAP values are used to understand model predictions and find the statistical feature driving accurate predictions. Finally, XGBoost models are trained on trajectories from specific diffusion modes to determine any increase in accuracy. Overall, the pipeline presented demonstrates the capability to provide information across multiple biological questions.

Description

Thesis (Master's)--University of Washington, 2022

Citation

DOI

Collections