Topics in Graph Clustering
In this thesis, two problems in social networks will be studied. In the first part of the thesis, we focus on community recovery problems for social networks. There have been many recent theoretical advances in the model-based community recovery for network data. In the center of it are the Stochastic Block Model (SBM) and its extension, Degree Corrected Stochastic Block Model (DC-SBM). Under assumptions on the balance and separation of clusters, theoretical guarantees have been provided to ensure the recovery of the true clusters with high probability.We firstly benchmark the current recovery theorems on DC-SBM through experimental approaches. The experiments suggest that there are still lots of cases that are recoverable but not predicted by the current recovery theorems. We then introduce a wider class of network models called Preference Frame Model. We show that under weaker assumptions, the communities or clusters can be recovered by spectral clustering algorithm with essentially the same guarantees. The model-based results, despite their importance, are limited by a strong and difficult-to-verify assumption that the observed data are generated from the model. We present the model-free community recovery, where we do not make assumptions about the data generating process and provide theoretical guarantees for the performance of the model based clustering algorithms in this framework. In the second part of the thesis, we propose a perturbation framework to measure the robustness of graph properties. Although there are already perturbation methods proposed to tackle this problem, they are limited by the fact that the strength of the perturbation cannot be well controlled. We firstly provide a perturbation framework on graphs by introducing weights on the nodes, of which the magnitude of perturbation can be easily controlled through the variance of the weights. Meanwhile, the topology of the graphs are also preserved to avoid uncontrollable strength in the perturbation. We then extend the measure of robustness in the robust statistics literature to the graph properties.
- Statistics