Polyglot Text Classification with Neural Document Models
| dc.contributor.advisor | Smith, Noah A | |
| dc.contributor.author | Gururangan, Suchin | |
| dc.date.accessioned | 2018-11-28T03:19:23Z | |
| dc.date.available | 2018-11-28T03:19:23Z | |
| dc.date.issued | 2018-11-28 | |
| dc.date.submitted | 2018 | |
| dc.description | Thesis (Master's)--University of Washington, 2018 | |
| dc.description.abstract | Sometimes, annotating data for text classification is expensive, so one must rely on techniques like parameter sharing and semi-supervised learning to improve classification performance in low-resource environments. In this thesis, I combine a generative, neural document model (Card et. al, 2018) and multilingual word vectors (Ammar et. al, 2016) to perform text classification on documents in eight languages. The model I propose jointly trains on labeled and unlabeled data from multiple languages, and incorporates additional document-level metadata, such as language ID, in its generative story. Through a series of experiments, I show that the model significantly outperforms monolingual baselines in low-resource environments. | |
| dc.embargo.terms | Open Access | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.other | Gururangan_washington_0250O_19274.pdf | |
| dc.identifier.uri | http://hdl.handle.net/1773/43081 | |
| dc.language.iso | en_US | |
| dc.rights | CC BY | |
| dc.subject | Generative models | |
| dc.subject | Multilingual NLP | |
| dc.subject | NLP | |
| dc.subject | Text Classification | |
| dc.subject | Variational Inference | |
| dc.subject | Artificial intelligence | |
| dc.subject | Linguistics | |
| dc.subject.other | Linguistics | |
| dc.title | Polyglot Text Classification with Neural Document Models | |
| dc.type | Thesis |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Gururangan_washington_0250O_19274.pdf
- Size:
- 635.98 KB
- Format:
- Adobe Portable Document Format
