VizioMetrics: Mining the Scientific Visual Literature
MetadataShow full item record
Scientific results are communicated visually in the literature through diagrams, visualizations, and photographs. In this thesis, we developed a figure processing pipeline to classify more than 8 million figures from PubMed Central into different figure types and study the resulting patterns of visual information as they relate to scholarly impact. We find a significant correlation between scientific impact and the use of visual information. Moreover, we find that citations within the same field tend to correlate with tables while citations from other fields tend to correlate with diagrams, suggesting that visual representations aid interdisciplinary communication. These results suggest that encoding results visually improves communicability, but these visual elements remain ensconced in the surrounding paper and difficult to use directly to facilitate information discovery tasks or longitudinal analytics. Very few applications in information retrieval, academic search, or bibliometrics make direct use of the figures, and none attempt to recognize and exploit the type of figure, which can be used to augment interactions with a large corpus of scholarly literature. We use these results to articulate a new research agenda ``viziometrics'' to study the organization and presentation of visual information in the scientific literature. We present VizioMetrics.org, a platform that extracts visual information from the scientific literature and makes it available for use in new information retrieval applications and for studies that look at patterns of visual information across millions of papers. The VizioMetrics.org processes a corpus of documents, classifies the figures, organizes the results into a cloud-hosted database, and drives three distinct applications to support bibliometric analysis and information retrieval. The first application supports information retrieval tasks by allowing rapid browsing of classified figures. The second application supports longitudinal analysis of visual patterns in the literature and facilitates data mining of these figures. The third application supports crowdsourced tagging of figures to improve classification, augment search, and facilitate new kinds of analyses. In addition, we proposed PhyloParser, an end-to-end framework for automatically extracting species relationships from phylogenetic trees using a multi-modal approach to digesting diverse tree styles. PhyloParser enables extraction of phylogenies from a large scale of dendrograms. As an extended application of VizioMetrics.org, we aim to build a public database of phylogenetic information that covers the historical literature as well as current data, and then use it to identify areas of disagreement and poor coverage in the biological literature.
- Electrical engineering