Issues in Named Entity Recognition on Early Modern English Letters
Loading...
Date
Authors
Woldenga-Racine, Vanessa
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The influx of digitized historical documents into online collections has made the study of these documents much more accessible to researchers and the general public. This data, however, is frequently raw data sometimes obtained through automated methods such as optical character recognition. Without rich metadata, the content of these documents is difficult to search and organize. Tasks commonly undertaken in the field of computational linguistics can aid in this endeavour. These documents often present challenges for modern systems, however, as the text contained in historical documents frequently differs in many ways from the present-day newswire these systems are most often trained on. In this thesis I explore the task of Named Entity Recognition on texts written in Early Modern English. I investigate three methodologies for bootstrapping training data to train a character-based neural net model. The results show substantial improvements upon all baselines, with the best f-measure at 60.31%
Description
Thesis (Master's)--University of Washington, 2019
