Data Driven Methods for Scaffolding Genomes with Hi-C
Loading...
Date
Authors
Sur, Aakash
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
High-quality reference genomes are once again in vogue with the publication of the telomere-to-telomere human genome and several challenging plant and animal genomes. Recent efforts in genome assembly have coalesced around two key technologies – ultra-long reads and genome chromatin conformation capture (Hi-C). Here, we used both to complete the protist genomes of Leishmania donovani, Leishmania tarentolae, Crithidia fasciculata, and Euglena gracilis, shedding light on their genomic organization and evolutionary history. To navigate the many Hi-C genome scaffolding methods, we benchmarked the most popular methods against a set of high-quality reference genomes. We found that while most can operate well under ideal circumstances, many struggle with using modern high-quality assemblies which contain near chromosome length contigs. Finally, we attempted to overcome these limitations using a machine learning approach by leveraging the recent bounty of genomes that have been published with Hi-C. Using an innovative convolutional neural network, we demonstrated a proof of concept for a data-driven approach to scaffolding genomes.
Description
Thesis (Ph.D.)--University of Washington, 2022
