Extracting Topically Related Synonyms from Twitter using Syntactic and Paraphrase Data

Antoniak, Maria Alexandra

Extracting Topically Related Synonyms from Twitter using Syntactic and Paraphrase Data

Files

Antoniak_washington_0250O_13899.pdf (248.76 KB)

Date

2015-02-24

relationships.isAuthorOf

Antoniak, Maria Alexandra

Abstract

The goal of synonym extraction is to automatically gather synsets (groups of synonyms) from a corpus. This task is related to the tasks of normalization and paraphrase detection. We present a series of approaches for synonym extraction on Twitter, which contains unique synonyms (e.g. slang, acronyms, and colloquialisms) for which no traditional resources exist. Because Twitter contains so much variation, we focus our extraction on certain topics. We show that this focus on topics yields significantly higher coverage on a corpus of paraphrases than previous work which was topic-insensitive. We demonstrate improvement on the task of paraphrase detection when we substitute our extracted synonyms into the paraphrase training set. The synonyms are learned by using chunks from a shallow parse to create candidate synonyms and their context windows, and the synonyms are incorporated into a paraphrase detection system that uses machine translation metrics as features for a classifier. When we train and test on the paraphrase training set and use synonyms extracted from the same paraphrase training set, we find a 2.29\% improvement in F1 and demonstrate better coverage than previous systems. This shows the potential of synonyms that are representative of a specific topic. We also find an improvement in F1 score of 0.81 points when we train on the paraphrase training set and test on the test set and use synonyms extracted with an unsupervised method on a corpus whose topics match those of the paraphrase test set. We also demonstrate an approach that uses distant supervision, creating a silver standard training and test set, which we use both to evaluate our synonyms and to demonstrate a supervised approach to synonym extraction.

Description

Thesis (Master's)--University of Washington, 2014

Keywords

natural language processing; svm; synonyms; twitter

URI

http://hdl.handle.net/1773/27513

Collections

Linguistics

Full item page

Extracting Topically Related Synonyms from Twitter using Syntactic and Paraphrase Data

Files

Date

relationships.isAuthorOf

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

DOI

Collections