Data Driven Methods for Scaffolding Genomes with Hi-C

dc.contributor.advisorMyler, Peter J
dc.contributor.authorSur, Aakash
dc.date.accessioned2022-09-23T20:41:48Z
dc.date.available2022-09-23T20:41:48Z
dc.date.issued2022-09-23
dc.date.submitted2022
dc.descriptionThesis (Ph.D.)--University of Washington, 2022
dc.description.abstractHigh-quality reference genomes are once again in vogue with the publication of the telomere-to-telomere human genome and several challenging plant and animal genomes. Recent efforts in genome assembly have coalesced around two key technologies – ultra-long reads and genome chromatin conformation capture (Hi-C). Here, we used both to complete the protist genomes of Leishmania donovani, Leishmania tarentolae, Crithidia fasciculata, and Euglena gracilis, shedding light on their genomic organization and evolutionary history. To navigate the many Hi-C genome scaffolding methods, we benchmarked the most popular methods against a set of high-quality reference genomes. We found that while most can operate well under ideal circumstances, many struggle with using modern high-quality assemblies which contain near chromosome length contigs. Finally, we attempted to overcome these limitations using a machine learning approach by leveraging the recent bounty of genomes that have been published with Hi-C. Using an innovative convolutional neural network, we demonstrated a proof of concept for a data-driven approach to scaffolding genomes.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherSur_washington_0250E_24692.pdf
dc.identifier.urihttp://hdl.handle.net/1773/49235
dc.language.isoen_US
dc.rightsCC BY
dc.subject
dc.subjectBioinformatics
dc.subjectGenetics
dc.subject.other
dc.titleData Driven Methods for Scaffolding Genomes with Hi-C
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Sur_washington_0250E_24692.pdf
Size:
5.57 MB
Format:
Adobe Portable Document Format