NemoCluster: Graph Clustering Algorithm for Structural Variant Detection

dc.contributor.advisorKim, Wooyoung
dc.contributor.authorrohde, nicola
dc.date.accessioned2020-08-14T03:22:19Z
dc.date.available2020-08-14T03:22:19Z
dc.date.issued2020-08-14
dc.date.submitted2020
dc.descriptionThesis (Master's)--University of Washington, 2020
dc.description.abstractStructural Variant detection is a problem of significant interest in the biomedical field due to the strong link between these variants and genetic and degenerative diseases. A large body of programs and approaches exist to detect these variants and they perform well on the human genome. However, benchmarks presented in this thesis show that these tools perform poorly on microbial genomes. One approach that has been shown to be effective in structural variant discovery is the use of clustering to detect anomalous regions in the genome. Well known tools such as DELLY use this approach to achieve high accuracy, however, no tools use a network-motif based clustering algorithm. The idea of anomalous genomic regions can be likened to community detection in social networks. This can be achieved by utilizing triangle-subgraphs, or size three cliques, to calcu late a triangle conductance for each edge in the network. However, using just cliques ignores a large amount of structural information within the network. This is fine in social networks where cliques represent tightly-nit groups and therefore have more significance than other structures. This however, does not extend well to other areas such as Bioinformatics, where it may be of interest to cluster networks based on network-motifs to capture more structural information contained within the graph than can be conveyed through cliques. This thesis introduces NemoCluster, an algorithm that generalizes the triangle conductance clustering to a network-motif conductance clustering. Accompanying this program are benchmarks that show it performing better than similar tools in both social networking applications as well as biological applications, such as protein-protein interaction networks, and in synthetic networks.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherrohde_washington_0250O_21698.pdf
dc.identifier.urihttp://hdl.handle.net/1773/45706
dc.language.isoen_US
dc.rightsnone
dc.subjectGraph Clustering
dc.subjectNetwork Motifs
dc.subjectStructural Variant Calling
dc.subjectBioinformatics
dc.subject.otherComputing and software systems
dc.titleNemoCluster: Graph Clustering Algorithm for Structural Variant Detection
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
rohde_washington_0250O_21698.pdf
Size:
2.61 MB
Format:
Adobe Portable Document Format