Dependency Parsing for Tweets

dc.contributor.advisorSmith, Noah A
dc.contributor.authorZhu, Yi
dc.date.accessioned2017-10-26T20:51:20Z
dc.date.available2017-10-26T20:51:20Z
dc.date.issued2017-10-26
dc.date.submitted2017-08
dc.descriptionThesis (Master's)--University of Washington, 2017-08
dc.description.abstractThis thesis concentrates on the problem of dependency parsing for Twitter texts. Twitter texts, also called tweets, are a typical kind of web domain language with many informal and specific linguistic phenomena (Eisenstein, 2013), which is drawing more attention in NLP research. Although parsing algorithms have achieved huge progress in newswire text data in recent years, it is hard for parsers directly trained on them to achieve comparable results in tweets (Foster et al., 2011a). Therefore, we try to tackle this problem in two aspects, data and model. In the first aspect, we discuss the Twitter specific linguistic phenomena that could cause challenges for creating tweet dependencies, and take them into account within our annotation formalisms. We create a new development set with 210 tweets for the first tweet dependency treebank, Tweebank (Kong et al., 2014). In the second aspect, we propose neural tweet parser, a novel neural dependency parser for tweets. We extend the stack LSTM parser (Dyer et al., 2015) and incorporate character embeddings (Ballesteros et al., 2015) into our word representations. We further explore both out-of-domain data by presenting a cascading model using pre-training and unannotated in-domain data using tri-training to increase the scale of the training data. Experimental results show that our neural tweet parser is over 15 times faster than Tweeboparser (Kong et al., 2014), the previous state-of-the-art parser for tweets. Our parser also benefits from both types of external data, and with tri-training data, our parser outperforms Tweeboparser.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherZhu_washington_0250O_17794.pdf
dc.identifier.urihttp://hdl.handle.net/1773/40620
dc.language.isoen_US
dc.rightsnone
dc.subjectDeep Learning
dc.subjectDependency Parsing
dc.subjectTweets
dc.subjectTwitter
dc.subjectArtificial intelligence
dc.subjectLinguistics
dc.subjectComputer science
dc.subject.otherLinguistics
dc.titleDependency Parsing for Tweets
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Zhu_washington_0250O_17794.pdf
Size:
1.27 MB
Format:
Adobe Portable Document Format

Collections