Polyglot Text Classification with Neural Document Models

dc.contributor.advisorSmith, Noah A
dc.contributor.authorGururangan, Suchin
dc.date.accessioned2018-11-28T03:19:23Z
dc.date.available2018-11-28T03:19:23Z
dc.date.issued2018-11-28
dc.date.submitted2018
dc.descriptionThesis (Master's)--University of Washington, 2018
dc.description.abstractSometimes, annotating data for text classification is expensive, so one must rely on techniques like parameter sharing and semi-supervised learning to improve classification performance in low-resource environments. In this thesis, I combine a generative, neural document model (Card et. al, 2018) and multilingual word vectors (Ammar et. al, 2016) to perform text classification on documents in eight languages. The model I propose jointly trains on labeled and unlabeled data from multiple languages, and incorporates additional document-level metadata, such as language ID, in its generative story. Through a series of experiments, I show that the model significantly outperforms monolingual baselines in low-resource environments.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherGururangan_washington_0250O_19274.pdf
dc.identifier.urihttp://hdl.handle.net/1773/43081
dc.language.isoen_US
dc.rightsCC BY
dc.subjectGenerative models
dc.subjectMultilingual NLP
dc.subjectNLP
dc.subjectText Classification
dc.subjectVariational Inference
dc.subjectArtificial intelligence
dc.subjectLinguistics
dc.subject.otherLinguistics
dc.titlePolyglot Text Classification with Neural Document Models
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Gururangan_washington_0250O_19274.pdf
Size:
635.98 KB
Format:
Adobe Portable Document Format

Collections