Root Cause Analysis of Failure in Molecular Biology Workflows

relationships.isAuthorOf

Li, Shuowei

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Root Cause Analysis of Failure in Molecular Biology Workflows: This dissertation describes a framework for accurate and efficient root cause analysis for repeatable biological workflows. This framework has three significant contributions. Firstly, we introduced Open operational protocol semantics (OOPS), a framework that enables the production of physics-based synthetic datasets for biological workflows. OOPS allows users to model the possible failure modes of a flow. We demonstrate how the OOPS framework generates diverse and physics-based synthetic datasets, including cell-free protein synthesis and polymerase chain reaction. Secondly, given the experimental outcomes and records of the items used for each workflow, our framework generates a probabilistic model to represent the variabilities of the workflows. The details of the experimental records, such as technicians and reagents involved in each operation of trials, are encoded to a low-dimensional embedding using a neural network. With the embeddings as the input, a logistic regression-based model is trained to predict whether a particular trial would succeed, considering the involvement of reagents and technicians. With this formulation, we can perform root cause analysis to identify the reason for the failure of trials in a quantitative manner. Furthermore, due to the small-scale biological data set, this hybrid approach allows users to train a compact neural network for feature extraction that facilities prediction based on logistic regression. We anticipate that our result will identify the source of variability and accelerate research progress. We found the source of variability with 86.75%, 98.9%, 97.71%, 99.7%, and 88.31% accuracy in synthetic cell-free protein synthesis, synthetic polymerase chain reaction, real polymerase chain reaction, real Gibson assembly, and real yeast strain construction, respectively. Lastly, we use statistical methods to rank reagents and technicians by their accuracy. The relative quality or accuracy is conditioned on the specific step of a workflow. We use synthetic and real datasets to demonstrate that the overall success rate has improved by replacing degraded reagents or retraining technicians with low accuracy. Effective Using Remote Lab in Promoting Simulation and Verification Tools: New modalities for conducting hands-on labs are needed with the transition to remote instruction during the pandemic. In particular, courses that entail major hardware components face challenges in making the hardware available for students reliably and sustainably. Furthermore, the industry partners expect students to be well-trained in simulation and verification tools. This dissertation presents our experience in using a virtual breadboard feature interfaced with real Field Programmable Gate Arrays (FPGA) boards located on the University of Washington's campus, where students can access the FPGA hardware remotely to complete their lab assignments. We evaluated this approach through anonymous surveys of students and industry partners. The findings demonstrate that we effectively transformed a lab assignment, typically carried out in person, into an online modality. Using remotely accessible hardware, a practice that showed promise in promoting students' skills in using verification tools, is a desirable skill in the industry.

Description

Thesis (Ph.D.)--University of Washington, 2022

Citation

DOI