Using External Knowledge to Improve Brown Clustering
Loading...
Date
Authors
Miljanic, Veljko
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
In recent years, semi-supervised learning methods that rely on using low-dimensional word representation gained interest in NLP, due to their ability take advantage of vastly available unlabeled data and reduce dependence on large, labeled datasets. In this thesis we propose two methods which allow integration of domain specific knowledge to one of the most popular methods for low dimensional word representations – Brown clustering. First, we propose changing the order in which words are clustered so that words more relevant for the task are given priority. Then, we also propose modifying the clustering objective so that it ensures relevance of induced clusters for downstream supervised task. Experiments show that both methods improve performance of NER system using cluster features.
Description
Thesis (Master's)--University of Washington, 2020
