Gene Network Inference using Machine Learning and Graph Algorithms on Big Biomedical Data
Gene networks capture the interactions between different biological entities. These gene networks have many applications in modern day biology. In particular, gene networks can help to shed light on the underlying mechanisms of diseases. Advances in biotechnology have led to the generation of different types of genome-wide data, profiling the activity levels across the entire genome. In this thesis, we generated informative and accurate gene networks by integrating multiple types of big biomedical data. Many algorithms have been proposed in the literature to infer gene networks from genome-wide data. However, it is non-trivial to distinguish direct edges between two nodes from indirect edges represented by a path connecting two nodes using these genome-wide data. In this thesis, I constructed compact and accurate gene networks by using an improved Bayesian Modeling Averaging based gene network inference algorithm which includes a post-processing step of removing indirect redundant edges. I applied this improved method to synthetic data in which the ground truth was already known and to real data in which external data sources were used to help assess and analyze the resulting gene networks. The assessment results were presented in two different forms, graphs and tables. In general, the results showed that the new gene network inference algorithm produced more accurate networks and the implementation is more efficient.