Polyglot Text Classification with Neural Document Models
Loading...
Date
Authors
Gururangan, Suchin
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Sometimes, annotating data for text classification is expensive, so one must rely on techniques like parameter sharing and semi-supervised learning to improve classification performance in low-resource environments. In this thesis, I combine a generative, neural document model (Card et. al, 2018) and multilingual word vectors (Ammar et. al, 2016) to perform text classification on documents in eight languages. The model I propose jointly trains on labeled and unlabeled data from multiple languages, and incorporates additional document-level metadata, such as language ID, in its generative story. Through a series of experiments, I show that the model significantly outperforms monolingual baselines in low-resource environments.
Description
Thesis (Master's)--University of Washington, 2018
