FPGA Accelerated Bioinformatics: Alignment, Classification, Homology and Counting

dc.contributor.advisorHauck, Scott
dc.contributor.authorMcVicar, Nathaniel S
dc.date.accessioned2019-02-22T17:01:43Z
dc.date.available2019-02-22T17:01:43Z
dc.date.submitted2018
dc.descriptionThesis (Ph.D.)--University of Washington, 2018
dc.description.abstractAdvances in next-generation sequencing technology have led to increases in genomic data production by sequencing machines that outpace Moore’s law. This trend has reached the point where the time and money spent processing human and other genome sequence data could be greater than that spent producing it. Field-Programmable Gate Arrays (FPGAs) can provide a solution to this problem. Bioinformatics accelerators running on FPGAs achieve order of magnitude speedups across a variety of genomics applications important in both biological research and clinical medicine. This dissertation presents three accelerators. The first addresses the short read alignment problem, where millions of short DNA or RNA reads, with lengths on the order of 100 base pairs, are aligned to an index built from a reference genome. Our aligner combines an FPGA accelerator with the greater memory bandwidth of our host system to produce a fast and flexible short read aligner. Using this aligner, we developed a classifier to determine which of two possible species each read originated from. In a case study with RNA-Seq reads from mouse and human retinal cultures our aligner produced more accurate classification results and better performance than software-based aligners. Our second accelerator tackles the problem of non-coding RNA (ncRNA) homology search. The biologically important functions these ncRNAs perform are determined by their two- or three-dimensional structure, and ncRNAs with different sequences can perform the same functions if they share a similar structure. Homology search scores sequences using models of ncRNA families in an effort to find previously unknown members. Our accelerator greatly improves the speed of filters that identify candidate sequences using the Viterbi and CYK algorithms. The final accelerator uses the Hybrid Memory Cube (HMC), a stacked DRAM, for K-mer counting. In many areas of bioinformatics, including de novo assembly, K-mer counting is a filter with important roles including removing read errors. Our approach stores K-mer counts in a Bloom filter on the HMC leveraging the greater random access rate for increased performance over both the host and FPGA-attached DRAM. Throughout this dissertation we demonstrate that FPGA accelerators can achieve excellent speedups across a variety of genomics applications.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherMcVicar_washington_0250E_19511.pdf
dc.identifier.urihttp://hdl.handle.net/1773/43267
dc.language.isoen_US
dc.rightsCC BY
dc.subjectalignment
dc.subjectbioinformatics
dc.subjectFPGA
dc.subjecthomology
dc.subjectk-mer counting
dc.subjectshort reads
dc.subjectComputer engineering
dc.subjectBioinformatics
dc.subject.otherElectrical engineering
dc.titleFPGA Accelerated Bioinformatics: Alignment, Classification, Homology and Counting
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
McVicar_washington_0250E_19511.pdf
Size:
3.48 MB
Format:
Adobe Portable Document Format