Named Entity Resolution for Historical Texts

dc.contributor.advisorKetchley, Sarah
dc.contributor.authorHolmes, Audrey
dc.date.accessioned2019-10-15T22:59:16Z
dc.date.issued2019-10-15
dc.date.submitted2019
dc.descriptionThesis (Master's)--University of Washington, 2019
dc.description.abstractThe field of digital humanities has spurred an increase in applications of computational lin- guistics to historical documents, but the field remains underdeveloped. Standard natural language processing (NLP) techniques developed using contemporary texts tend to perform poorly when applied to historical documents due to challenges such as spelling variation, semantic shifts, and lack of standard orthography. In this thesis, we compare performance of common Named Entity Recognition (NER) libraries including Stanford CoreNLP, spaCy, and Flair on historical texts. We also present a method for named entity resolution designed specifically for historical texts, which combines domain adapted word embeddings with pho- netic and lexical similarities. This has the potential to increase the speed of digitization of historical documents and improve search capabilities across historical corpora. The algorithm is one of the first trained on historical documents and improves upon common approaches to spelling normalization for historical documents using only lexical and/or phonetic similarity. Additionally, we provide a user interface so that scholars without programming expertise can easily use the tools developed in this thesis. Future work will include linking historical named entities to contemporary references and constructing knowledge graphs for historical corpora.
dc.embargo.lift2020-10-14T22:59:16Z
dc.embargo.termsRestrict to UW for 1 year -- then make Open Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherHolmes_washington_0250O_20778.pdf
dc.identifier.urihttp://hdl.handle.net/1773/44844
dc.language.isoen_US
dc.rightsnone
dc.subject
dc.subjectLinguistics
dc.subjectComputer science
dc.subject.otherLinguistics
dc.titleNamed Entity Resolution for Historical Texts
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Holmes_washington_0250O_20778.pdf
Size:
992.92 KB
Format:
Adobe Portable Document Format

Collections