Application of Natural Language Processing Toward Scientific Text Understanding
Loading...
Date
Authors
Amini, Aida
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Natural language processing over scientific text is useful for many downstream tasks, such as mathematical question answering, understanding chemical processes, and extracting knowledge base relations from biology-related articles. Scientific text, which consists of continuous formal statements, technical lexicon, and implicit relations differs significantly from general narratives. Its analysis and labeling requires laborious effort from domain experts with sufficient background knowledge. Therefore, the labeled data in scientific domains can be more scarce and noisy. Further, answering technical questions related to these domains (e.g., algebra or physics queries) requires reasoning capabilities beyond those available for conventional natural language processing.The emergence of pre-trained language models has improved the quality of predictions in many tasks, but due to the use of technical terms and formal statements, these models do not perform well over scientific tasks. Therefore, additional methods are needed both for collecting data and navigating textual implications in order to achieve better outcomes. Furthermore, scientific texts usually contain implicit references to background knowledge within the same or different scientific domain, and as the result of sparsity in training sets, models lack the ability to encode the necessary background knowledge at training time.
This dissertation presents applications of natural language processing (NLP) toward better understanding and grounding the scientific literature in terms of (a) data curation, (b) modeling approaches, (c) large-scale application construction, and (d) evaluation. We present the results of three research endeavors on scientific text analysis. Specifically, we tackle (1) answering questions in the mathematical domain, (2) understanding and following the elements of a scientific process (e.g., natural, chemistry), and (3) knowledge base (KB) construction of functional relations over diverse and interdisciplinary scientific domains. Each task poses challenges due to the scarcity of annotated data and the need for a higher level of inference. There still remain limitations to completely comprehend and understand the scientific text using NLP techniques, but our proposed approaches can further push the boundaries of scientific text understanding.
Description
Thesis (Ph.D.)--University of Washington, 2021
