Scalable and cloud-enabled analysis of long read sequencing data

dc.contributor.advisorYeung, Ka Yee
dc.contributor.authorReddy, Shishir
dc.date.accessioned2023-01-21T05:00:48Z
dc.date.available2023-01-21T05:00:48Z
dc.date.issued2023-01-21
dc.date.submitted2022
dc.descriptionThesis (Master's)--University of Washington, 2022
dc.description.abstractLong-read sequencing has great promise in enabling portable, rapid molecular-assisted diagnoses. Applications of long-read sequencing include improved prognosis of critically ill patients through variant detection along with rapid genetic diagnoses. A key challenge in democratizing long-read sequencing technology in the biomedical and clinical community is the lack of graphical bioinformatics software tools which can efficiently process the raw data, support graphical output and interactive visualizations for interpretations of results. Another obstacle is that high performance software tools for long-read sequencing data analyses often leverage graphics processing units (GPU), which is challenging and time-consuming to configure, especially on the cloud. Many solutions can be explored in long-read sequencing including the addition of graphical bioinformatics software tools, hardware acceleration such as Graphics Processing Units (GPUs), or optimization with Tensor Processing Units (TPUs). Long-read sequencing workflows for diagnosis involve several steps that can be hardware-accelerated and optimized using various processing methods. Optimizing long-read sequencing workflows through hardware-acceleration can reduce turnaround times of diagnoses from days to hours. Our goal is to create and optimize long-read sequencing workflows to build rapid, cost-effective solutions for cancer detection and diagnosis on the cloud. This thesis introduces two containerized, hardware-accelerated long-read sequencing analysis workflows for fusion analysis and variant-calling. The fusion analysis workflow introduces a fusion finding tool -- the Biodepot Fusion Finder (BFF) -- capable of rapidly detecting fusions and calculating sample enrichment. This fusion workflow is benchmarked for accuracy and compared to the fusion finding software LongGF on cell-line and patient samples of nanopore data. The variant-calling workflow uses PEPPER-Margin-Deepvariant to call structural variants in a cloud-based GPU-enabled environment. This workflow is benchmarked for accuracy between GPU and CPU versions of the variant-calling software for better visibility in which specific stages of the pipeline benefit from hardware acceleration.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherReddy_washington_0250O_25074.pdf
dc.identifier.urihttp://hdl.handle.net/1773/49574
dc.language.isoen_US
dc.rightsCC BY
dc.subjectBiodepot
dc.subjectCancer
dc.subjectCloud
dc.subjectGPU
dc.subjectNanopore
dc.subjectSequencing
dc.subjectComputer science
dc.subject.other
dc.titleScalable and cloud-enabled analysis of long read sequencing data
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Reddy_washington_0250O_25074.pdf
Size:
1.78 MB
Format:
Adobe Portable Document Format