Multi-task graph-based information extraction with global context

Luan, Yi

Multi-task graph-based information extraction with global context

dc.contributor.advisor	Ostendorf, Mari
dc.contributor.advisor	Hajishirzi, Hannaneh
dc.contributor.author	Luan, Yi
dc.date.accessioned	2019-08-14T22:26:38Z
dc.date.available	2019-08-14T22:26:38Z
dc.date.submitted	2019
dc.description	Thesis (Ph.D.)--University of Washington, 2019
dc.description.abstract	With growing numbers of written documents in the world, it is crucial to leverage automatic language processing so that people can make better use of the information. The main challenge stems from the fact that the information in written text is not as easily used as information in a structured database. Therefore it is very important to understand and automatically extract structured information from large amount of unstructured texts. To tackle this problem, Information Extraction (IE) is the widely studied task of retrieving structured information from text. In this thesis, our goal is to develop a general high performance IE system that can work across many different domains and tasks, but particularly the less well studied domain of scientific literature. Towards achieving this goal, we propose a series of general IE frameworks that addresses the task of entity recognition, relation extraction and coreference resolution. This thesis research addresses challenges common to all such IE systems: 1) how to leverage large unannotated data when annotated training data are limited; and 2) how to model the interactions between different tasks so that the tasks can best benefit each other. In this thesis, we first develop an efficient way of improving the performance of supervised neural systems through semi-supervised learning. We introduce a method of integrating a graph based semi-supervised algorithm together with a confidence-based self-training scheme to leverage unannotated articles. We also introduce two general IE frameworks, Span-based IE (SPANIE) and Dynamic Graph IE (DYGIE) for coupling multiple information extraction tasks through shared span representations. Our frameworks are effective for all three tasks, demonstrating a benefit from incorporating broader context learned from relation and coreference annotations. The DYGIE model achieves state of the art in 5 different datasets covering a range of domains including News, Scientific Literature to Biomedical and Wetlab Reports. We further apply the approach to construct knowledge graph for scientific papers. We create a dataset SciERC for scientific information extraction, which includes expert annotations of scientific terms, relation categories and co-reference links. The resulting knowledge graph is used for paper abstract generation and academic trend analysis.
dc.embargo.terms	Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	Luan_washington_0250E_20112.pdf
dc.identifier.uri	http://hdl.handle.net/1773/43958
dc.language.iso	en_US
dc.rights	none
dc.subject
dc.subject	Computer science
dc.subject.other	Electrical engineering
dc.title	Multi-task graph-based information extraction with global context
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Luan_washington_0250E_20112.pdf
Size:: 2.12 MB
Format:: Adobe Portable Document Format

Download

Collections

Electrical engineering