NemoCluster: Graph Clustering Algorithm for Structural Variant Detection
Loading...
Date
Authors
rohde, nicola
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Structural Variant detection is a problem of significant interest in the biomedical field due to the strong link between these variants and genetic and degenerative diseases. A large body of programs and approaches exist to detect these variants and they perform well on the human genome. However, benchmarks presented in this thesis show that these tools perform poorly on microbial genomes. One approach that has been shown to be effective in structural variant discovery is the use of clustering to detect anomalous regions in the genome. Well known tools such as DELLY use this approach to achieve high accuracy, however, no tools use a network-motif based clustering algorithm. The idea of anomalous genomic regions can be likened to community detection in social networks. This can be achieved by utilizing triangle-subgraphs, or size three cliques, to calcu late a triangle conductance for each edge in the network. However, using just cliques ignores a large amount of structural information within the network. This is fine in social networks where cliques represent tightly-nit groups and therefore have more significance than other structures. This however, does not extend well to other areas such as Bioinformatics, where it may be of interest to cluster networks based on network-motifs to capture more structural information contained within the graph than can be conveyed through cliques. This thesis introduces NemoCluster, an algorithm that generalizes the triangle conductance clustering to a network-motif conductance clustering. Accompanying this program are benchmarks that show it performing better than similar tools in both social networking applications as well as biological applications, such as protein-protein interaction networks, and in synthetic networks.
Description
Thesis (Master's)--University of Washington, 2020
