Dependency Parsing for Tweets

Zhu, Yi

Dependency Parsing for Tweets

dc.contributor.advisor	Smith, Noah A
dc.contributor.author	Zhu, Yi
dc.date.accessioned	2017-10-26T20:51:20Z
dc.date.available	2017-10-26T20:51:20Z
dc.date.issued	2017-10-26
dc.date.submitted	2017-08
dc.description	Thesis (Master's)--University of Washington, 2017-08
dc.description.abstract	This thesis concentrates on the problem of dependency parsing for Twitter texts. Twitter texts, also called tweets, are a typical kind of web domain language with many informal and specific linguistic phenomena (Eisenstein, 2013), which is drawing more attention in NLP research. Although parsing algorithms have achieved huge progress in newswire text data in recent years, it is hard for parsers directly trained on them to achieve comparable results in tweets (Foster et al., 2011a). Therefore, we try to tackle this problem in two aspects, data and model. In the first aspect, we discuss the Twitter specific linguistic phenomena that could cause challenges for creating tweet dependencies, and take them into account within our annotation formalisms. We create a new development set with 210 tweets for the first tweet dependency treebank, Tweebank (Kong et al., 2014). In the second aspect, we propose neural tweet parser, a novel neural dependency parser for tweets. We extend the stack LSTM parser (Dyer et al., 2015) and incorporate character embeddings (Ballesteros et al., 2015) into our word representations. We further explore both out-of-domain data by presenting a cascading model using pre-training and unannotated in-domain data using tri-training to increase the scale of the training data. Experimental results show that our neural tweet parser is over 15 times faster than Tweeboparser (Kong et al., 2014), the previous state-of-the-art parser for tweets. Our parser also benefits from both types of external data, and with tri-training data, our parser outperforms Tweeboparser.
dc.embargo.terms	Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	Zhu_washington_0250O_17794.pdf
dc.identifier.uri	http://hdl.handle.net/1773/40620
dc.language.iso	en_US
dc.rights	none
dc.subject	Deep Learning
dc.subject	Dependency Parsing
dc.subject	Tweets
dc.subject	Twitter
dc.subject	Artificial intelligence
dc.subject	Linguistics
dc.subject	Computer science
dc.subject.other	Linguistics
dc.title	Dependency Parsing for Tweets
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Zhu_washington_0250O_17794.pdf
Size:: 1.27 MB
Format:: Adobe Portable Document Format

Download

Collections

Linguistics