Scalable and cloud-enabled analysis of long read sequencing data

Reddy, Shishir

Scalable and cloud-enabled analysis of long read sequencing data

dc.contributor.advisor	Yeung, Ka Yee
dc.contributor.author	Reddy, Shishir
dc.date.accessioned	2023-01-21T05:00:48Z
dc.date.available	2023-01-21T05:00:48Z
dc.date.issued	2023-01-21
dc.date.submitted	2022
dc.description	Thesis (Master's)--University of Washington, 2022
dc.description.abstract	Long-read sequencing has great promise in enabling portable, rapid molecular-assisted diagnoses. Applications of long-read sequencing include improved prognosis of critically ill patients through variant detection along with rapid genetic diagnoses. A key challenge in democratizing long-read sequencing technology in the biomedical and clinical community is the lack of graphical bioinformatics software tools which can efficiently process the raw data, support graphical output and interactive visualizations for interpretations of results. Another obstacle is that high performance software tools for long-read sequencing data analyses often leverage graphics processing units (GPU), which is challenging and time-consuming to configure, especially on the cloud. Many solutions can be explored in long-read sequencing including the addition of graphical bioinformatics software tools, hardware acceleration such as Graphics Processing Units (GPUs), or optimization with Tensor Processing Units (TPUs). Long-read sequencing workflows for diagnosis involve several steps that can be hardware-accelerated and optimized using various processing methods. Optimizing long-read sequencing workflows through hardware-acceleration can reduce turnaround times of diagnoses from days to hours. Our goal is to create and optimize long-read sequencing workflows to build rapid, cost-effective solutions for cancer detection and diagnosis on the cloud. This thesis introduces two containerized, hardware-accelerated long-read sequencing analysis workflows for fusion analysis and variant-calling. The fusion analysis workflow introduces a fusion finding tool -- the Biodepot Fusion Finder (BFF) -- capable of rapidly detecting fusions and calculating sample enrichment. This fusion workflow is benchmarked for accuracy and compared to the fusion finding software LongGF on cell-line and patient samples of nanopore data. The variant-calling workflow uses PEPPER-Margin-Deepvariant to call structural variants in a cloud-based GPU-enabled environment. This workflow is benchmarked for accuracy between GPU and CPU versions of the variant-calling software for better visibility in which specific stages of the pipeline benefit from hardware acceleration.
dc.embargo.terms	Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	Reddy_washington_0250O_25074.pdf
dc.identifier.uri	http://hdl.handle.net/1773/49574
dc.language.iso	en_US
dc.rights	CC BY
dc.subject	Biodepot
dc.subject	Cancer
dc.subject	Cloud
dc.subject	GPU
dc.subject	Nanopore
dc.subject	Sequencing
dc.subject	Computer science
dc.subject.other
dc.title	Scalable and cloud-enabled analysis of long read sequencing data
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Reddy_washington_0250O_25074.pdf
Size:: 1.78 MB
Format:: Adobe Portable Document Format

Download

Collections

Computer science and systems (Tacoma)