New methods for haplotyping and de novo assembly of genomes and metagenomes
Burton, Joshua N.
MetadataShow full item record
The study of genomics is made possible by the creation of genome assemblies: strings of sequences that represent the DNA content of a species, or an individual within a species. However, genome assemblies do not spring fully formed from DNA sequencing machines. Sequencing produces small fragments of DNA, and these fragments must be combined into a prediction of an organism's genome by a process called de novo assembly. The advent of "next-generation" DNA sequencing technologies over the past decade has vastly increased our capacity to sequence new genomes, but it has exacerbated the difficulty of de novo assembly, turning it into one of the foremost challenges in computational biology today. Especially problematic is the short length of many next-generation reads, which deprives genome assemblies of crucial information about sequence contiguity. Here I describe new methods for creating high-contiguity genome assemblies from short next-generation reads. I demonstrate that novel proper library preparation can create short reads that retain long-range contiguity information, and I develop novel algorithms to exploit this information for de novo genome assembly. First, I introduce the concept of using Hi-C for de novo genome assembly. I demonstrate that Hi-C produces signals of genomic contiguity that can be used for chromosome-scale scaffolding of de novo genome assemblies. Secondly, I show that Hi-C can also be used for metagenomic deconvolution. Finally, I use fosmid clone pools and copy number analysis to perform haplotype resolution on the genome of the famous HeLa cancer cell line. These approaches allow us to make productive use of the continual advances in next-generation sequencing and will improve standards for genome assemblies.
- Genetics