Data Driven Methods for Scaffolding Genomes with Hi-C

Sur, Aakash

Data Driven Methods for Scaffolding Genomes with Hi-C

dc.contributor.advisor	Myler, Peter J
dc.contributor.author	Sur, Aakash
dc.date.accessioned	2022-09-23T20:41:48Z
dc.date.available	2022-09-23T20:41:48Z
dc.date.issued	2022-09-23
dc.date.submitted	2022
dc.description	Thesis (Ph.D.)--University of Washington, 2022
dc.description.abstract	High-quality reference genomes are once again in vogue with the publication of the telomere-to-telomere human genome and several challenging plant and animal genomes. Recent efforts in genome assembly have coalesced around two key technologies – ultra-long reads and genome chromatin conformation capture (Hi-C). Here, we used both to complete the protist genomes of Leishmania donovani, Leishmania tarentolae, Crithidia fasciculata, and Euglena gracilis, shedding light on their genomic organization and evolutionary history. To navigate the many Hi-C genome scaffolding methods, we benchmarked the most popular methods against a set of high-quality reference genomes. We found that while most can operate well under ideal circumstances, many struggle with using modern high-quality assemblies which contain near chromosome length contigs. Finally, we attempted to overcome these limitations using a machine learning approach by leveraging the recent bounty of genomes that have been published with Hi-C. Using an innovative convolutional neural network, we demonstrated a proof of concept for a data-driven approach to scaffolding genomes.
dc.embargo.terms	Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	Sur_washington_0250E_24692.pdf
dc.identifier.uri	http://hdl.handle.net/1773/49235
dc.language.iso	en_US
dc.rights	CC BY
dc.subject
dc.subject	Bioinformatics
dc.subject	Genetics
dc.subject.other
dc.title	Data Driven Methods for Scaffolding Genomes with Hi-C
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Sur_washington_0250E_24692.pdf
Size:: 5.57 MB
Format:: Adobe Portable Document Format

Download

Collections

Biomedical and health informatics