Multi-task graph-based information extraction with global context
| dc.contributor.advisor | Ostendorf, Mari | |
| dc.contributor.advisor | Hajishirzi, Hannaneh | |
| dc.contributor.author | Luan, Yi | |
| dc.date.accessioned | 2019-08-14T22:26:38Z | |
| dc.date.available | 2019-08-14T22:26:38Z | |
| dc.date.submitted | 2019 | |
| dc.description | Thesis (Ph.D.)--University of Washington, 2019 | |
| dc.description.abstract | With growing numbers of written documents in the world, it is crucial to leverage automatic language processing so that people can make better use of the information. The main challenge stems from the fact that the information in written text is not as easily used as information in a structured database. Therefore it is very important to understand and automatically extract structured information from large amount of unstructured texts. To tackle this problem, Information Extraction (IE) is the widely studied task of retrieving structured information from text. In this thesis, our goal is to develop a general high performance IE system that can work across many different domains and tasks, but particularly the less well studied domain of scientific literature. Towards achieving this goal, we propose a series of general IE frameworks that addresses the task of entity recognition, relation extraction and coreference resolution. This thesis research addresses challenges common to all such IE systems: 1) how to leverage large unannotated data when annotated training data are limited; and 2) how to model the interactions between different tasks so that the tasks can best benefit each other. In this thesis, we first develop an efficient way of improving the performance of supervised neural systems through semi-supervised learning. We introduce a method of integrating a graph based semi-supervised algorithm together with a confidence-based self-training scheme to leverage unannotated articles. We also introduce two general IE frameworks, Span-based IE (SPANIE) and Dynamic Graph IE (DYGIE) for coupling multiple information extraction tasks through shared span representations. Our frameworks are effective for all three tasks, demonstrating a benefit from incorporating broader context learned from relation and coreference annotations. The DYGIE model achieves state of the art in 5 different datasets covering a range of domains including News, Scientific Literature to Biomedical and Wetlab Reports. We further apply the approach to construct knowledge graph for scientific papers. We create a dataset SciERC for scientific information extraction, which includes expert annotations of scientific terms, relation categories and co-reference links. The resulting knowledge graph is used for paper abstract generation and academic trend analysis. | |
| dc.embargo.terms | Open Access | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.other | Luan_washington_0250E_20112.pdf | |
| dc.identifier.uri | http://hdl.handle.net/1773/43958 | |
| dc.language.iso | en_US | |
| dc.rights | none | |
| dc.subject | ||
| dc.subject | Computer science | |
| dc.subject.other | Electrical engineering | |
| dc.title | Multi-task graph-based information extraction with global context | |
| dc.type | Thesis |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Luan_washington_0250E_20112.pdf
- Size:
- 2.12 MB
- Format:
- Adobe Portable Document Format
