Ontology-driven pathway data integration
Wang, Lucy Lu
MetadataShow full item record
Biological pathways are useful tools for understanding human physiology and disease pathogenesis. Pathway analysis can be used to detect genes and functions associated with complex disease phenotypes. When performing pathway analysis, researchers take advantage of multiple pathway datasets, combining pathways from different pathway databases. Pathways from different databases do not easily inter-operate, and the resulting combined pathway dataset can suffer from redundancy or reduced interpretability. Ontologies have been used to organize pathway data and eliminate redundancy. I generated clusters of semantically similar pathways by mapping pathways from seven databases to classes of one such ontology, the Pathway Ontology (PW). I then produced a typology of differences between pathways by summarizing the differences in content and knowledge representation between databases. Using the typology, I optimized an entity and graph-based network alignment algorithm for aligning pathways between databases. The algorithm was applied to clusters of semantically similar pathways to generate normalized pathways for each PW class. These normalized pathways were used to produce normalized gene sets for gene set enrichment analysis (GSEA). I evaluated these normalized gene sets against baseline gene sets in GSEA using four public gene expression datasets. Results suggest that normalized pathways can help to reduce redundancy in enrichment outputs. The normalized pathways also retain the hierarchical structure of the PW, which can be used to visualize enrichment results and provide hints for interpretation. Ontology-based organization of biological pathways can play a vital role in improving data quality and interoperability, and the resulting normalized pathways may have broad applications in genomic analysis.