Extracting Topically Related Synonyms from Twitter using Syntactic and Paraphrase Data

dc.contributor.advisorXia, Feien_US
dc.contributor.authorAntoniak, Maria Alexandraen_US
dc.date.accessioned2015-02-24T17:36:29Z
dc.date.available2015-02-24T17:36:29Z
dc.date.issued2015-02-24
dc.date.submitted2014en_US
dc.descriptionThesis (Master's)--University of Washington, 2014en_US
dc.description.abstractThe goal of synonym extraction is to automatically gather synsets (groups of synonyms) from a corpus. This task is related to the tasks of normalization and paraphrase detection. We present a series of approaches for synonym extraction on Twitter, which contains unique synonyms (e.g. slang, acronyms, and colloquialisms) for which no traditional resources exist. Because Twitter contains so much variation, we focus our extraction on certain topics. We show that this focus on topics yields significantly higher coverage on a corpus of paraphrases than previous work which was topic-insensitive. We demonstrate improvement on the task of paraphrase detection when we substitute our extracted synonyms into the paraphrase training set. The synonyms are learned by using chunks from a shallow parse to create candidate synonyms and their context windows, and the synonyms are incorporated into a paraphrase detection system that uses machine translation metrics as features for a classifier. When we train and test on the paraphrase training set and use synonyms extracted from the same paraphrase training set, we find a 2.29\% improvement in F1 and demonstrate better coverage than previous systems. This shows the potential of synonyms that are representative of a specific topic. We also find an improvement in F1 score of 0.81 points when we train on the paraphrase training set and test on the test set and use synonyms extracted with an unsupervised method on a corpus whose topics match those of the paraphrase test set. We also demonstrate an approach that uses distant supervision, creating a silver standard training and test set, which we use both to evaluate our synonyms and to demonstrate a supervised approach to synonym extraction.en_US
dc.embargo.termsOpen Accessen_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.otherAntoniak_washington_0250O_13899.pdfen_US
dc.identifier.urihttp://hdl.handle.net/1773/27513
dc.language.isoen_USen_US
dc.rightsCopyright is held by the individual authors.en_US
dc.subjectnatural language processing; svm; synonyms; twitteren_US
dc.subject.otherLinguisticsen_US
dc.subject.otherComputer scienceen_US
dc.subject.otherlinguisticsen_US
dc.titleExtracting Topically Related Synonyms from Twitter using Syntactic and Paraphrase Dataen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Antoniak_washington_0250O_13899.pdf
Size:
248.76 KB
Format:
Adobe Portable Document Format

Collections