Scalable and cloud-enabled analysis of long read sequencing data

Loading...
Thumbnail Image

Authors

Reddy, Shishir

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Long-read sequencing has great promise in enabling portable, rapid molecular-assisted diagnoses. Applications of long-read sequencing include improved prognosis of critically ill patients through variant detection along with rapid genetic diagnoses. A key challenge in democratizing long-read sequencing technology in the biomedical and clinical community is the lack of graphical bioinformatics software tools which can efficiently process the raw data, support graphical output and interactive visualizations for interpretations of results. Another obstacle is that high performance software tools for long-read sequencing data analyses often leverage graphics processing units (GPU), which is challenging and time-consuming to configure, especially on the cloud. Many solutions can be explored in long-read sequencing including the addition of graphical bioinformatics software tools, hardware acceleration such as Graphics Processing Units (GPUs), or optimization with Tensor Processing Units (TPUs). Long-read sequencing workflows for diagnosis involve several steps that can be hardware-accelerated and optimized using various processing methods. Optimizing long-read sequencing workflows through hardware-acceleration can reduce turnaround times of diagnoses from days to hours. Our goal is to create and optimize long-read sequencing workflows to build rapid, cost-effective solutions for cancer detection and diagnosis on the cloud. This thesis introduces two containerized, hardware-accelerated long-read sequencing analysis workflows for fusion analysis and variant-calling. The fusion analysis workflow introduces a fusion finding tool -- the Biodepot Fusion Finder (BFF) -- capable of rapidly detecting fusions and calculating sample enrichment. This fusion workflow is benchmarked for accuracy and compared to the fusion finding software LongGF on cell-line and patient samples of nanopore data. The variant-calling workflow uses PEPPER-Margin-Deepvariant to call structural variants in a cloud-based GPU-enabled environment. This workflow is benchmarked for accuracy between GPU and CPU versions of the variant-calling software for better visibility in which specific stages of the pipeline benefit from hardware acceleration.

Description

Thesis (Master's)--University of Washington, 2022

Citation

DOI