FPGA Accelerated Bioinformatics: Alignment, Classification, Homology and Counting

McVicar, Nathaniel S

FPGA Accelerated Bioinformatics: Alignment, Classification, Homology and Counting

dc.contributor.advisor	Hauck, Scott
dc.contributor.author	McVicar, Nathaniel S
dc.date.accessioned	2019-02-22T17:01:43Z
dc.date.available	2019-02-22T17:01:43Z
dc.date.submitted	2018
dc.description	Thesis (Ph.D.)--University of Washington, 2018
dc.description.abstract	Advances in next-generation sequencing technology have led to increases in genomic data production by sequencing machines that outpace Moore’s law. This trend has reached the point where the time and money spent processing human and other genome sequence data could be greater than that spent producing it. Field-Programmable Gate Arrays (FPGAs) can provide a solution to this problem. Bioinformatics accelerators running on FPGAs achieve order of magnitude speedups across a variety of genomics applications important in both biological research and clinical medicine. This dissertation presents three accelerators. The first addresses the short read alignment problem, where millions of short DNA or RNA reads, with lengths on the order of 100 base pairs, are aligned to an index built from a reference genome. Our aligner combines an FPGA accelerator with the greater memory bandwidth of our host system to produce a fast and flexible short read aligner. Using this aligner, we developed a classifier to determine which of two possible species each read originated from. In a case study with RNA-Seq reads from mouse and human retinal cultures our aligner produced more accurate classification results and better performance than software-based aligners. Our second accelerator tackles the problem of non-coding RNA (ncRNA) homology search. The biologically important functions these ncRNAs perform are determined by their two- or three-dimensional structure, and ncRNAs with different sequences can perform the same functions if they share a similar structure. Homology search scores sequences using models of ncRNA families in an effort to find previously unknown members. Our accelerator greatly improves the speed of filters that identify candidate sequences using the Viterbi and CYK algorithms. The final accelerator uses the Hybrid Memory Cube (HMC), a stacked DRAM, for K-mer counting. In many areas of bioinformatics, including de novo assembly, K-mer counting is a filter with important roles including removing read errors. Our approach stores K-mer counts in a Bloom filter on the HMC leveraging the greater random access rate for increased performance over both the host and FPGA-attached DRAM. Throughout this dissertation we demonstrate that FPGA accelerators can achieve excellent speedups across a variety of genomics applications.
dc.embargo.terms	Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	McVicar_washington_0250E_19511.pdf
dc.identifier.uri	http://hdl.handle.net/1773/43267
dc.language.iso	en_US
dc.rights	CC BY
dc.subject	alignment
dc.subject	bioinformatics
dc.subject	FPGA
dc.subject	homology
dc.subject	k-mer counting
dc.subject	short reads
dc.subject	Computer engineering
dc.subject	Bioinformatics
dc.subject.other	Electrical engineering
dc.title	FPGA Accelerated Bioinformatics: Alignment, Classification, Homology and Counting
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: McVicar_washington_0250E_19511.pdf
Size:: 3.48 MB
Format:: Adobe Portable Document Format

Download

Collections

Electrical engineering