VizioMetrics: Mining the Scientific Visual Literature

dc.contributor.advisorShapiro, Linda G
dc.contributor.advisorHowe, Bill
dc.contributor.authorLee, Po-shen
dc.date.accessioned2017-08-11T22:53:52Z
dc.date.issued2017-08-11
dc.date.submitted2017-06
dc.descriptionThesis (Ph.D.)--University of Washington, 2017-06
dc.description.abstractScientific results are communicated visually in the literature through diagrams, visualizations, and photographs. In this thesis, we developed a figure processing pipeline to classify more than 8 million figures from PubMed Central into different figure types and study the resulting patterns of visual information as they relate to scholarly impact. We find a significant correlation between scientific impact and the use of visual information. Moreover, we find that citations within the same field tend to correlate with tables while citations from other fields tend to correlate with diagrams, suggesting that visual representations aid interdisciplinary communication. These results suggest that encoding results visually improves communicability, but these visual elements remain ensconced in the surrounding paper and difficult to use directly to facilitate information discovery tasks or longitudinal analytics. Very few applications in information retrieval, academic search, or bibliometrics make direct use of the figures, and none attempt to recognize and exploit the type of figure, which can be used to augment interactions with a large corpus of scholarly literature. We use these results to articulate a new research agenda ``viziometrics'' to study the organization and presentation of visual information in the scientific literature. We present VizioMetrics.org, a platform that extracts visual information from the scientific literature and makes it available for use in new information retrieval applications and for studies that look at patterns of visual information across millions of papers. The VizioMetrics.org processes a corpus of documents, classifies the figures, organizes the results into a cloud-hosted database, and drives three distinct applications to support bibliometric analysis and information retrieval. The first application supports information retrieval tasks by allowing rapid browsing of classified figures. The second application supports longitudinal analysis of visual patterns in the literature and facilitates data mining of these figures. The third application supports crowdsourced tagging of figures to improve classification, augment search, and facilitate new kinds of analyses. In addition, we proposed PhyloParser, an end-to-end framework for automatically extracting species relationships from phylogenetic trees using a multi-modal approach to digesting diverse tree styles. PhyloParser enables extraction of phylogenies from a large scale of dendrograms. As an extended application of VizioMetrics.org, we aim to build a public database of phylogenetic information that covers the historical literature as well as current data, and then use it to identify areas of disagreement and poor coverage in the biological literature.
dc.embargo.lift2018-08-11T22:53:52Z
dc.embargo.termsRestrict to UW for 1 year -- then make Open Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherLee_washington_0250E_17154.pdf
dc.identifier.urihttp://hdl.handle.net/1773/40039
dc.language.isoen_US
dc.rightsCC BY
dc.subjectbibliometrics
dc.subjectcomputer vision
dc.subjectimage processing
dc.subjectinformation retrieval
dc.subjectscientometrics
dc.subjectviziometrics
dc.subjectInformation science
dc.subjectComputer science
dc.subject.otherElectrical engineering
dc.titleVizioMetrics: Mining the Scientific Visual Literature
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Lee_washington_0250E_17154.pdf
Size:
30.03 MB
Format:
Adobe Portable Document Format