Levow, Gina-AnneMiljanic, Veljko2021-03-192021-03-192021-03-192020Miljanic_washington_0250O_22373.pdfhttp://hdl.handle.net/1773/46828Thesis (Master's)--University of Washington, 2020In recent years, semi-supervised learning methods that rely on using low-dimensional word representation gained interest in NLP, due to their ability take advantage of vastly available unlabeled data and reduce dependence on large, labeled datasets. In this thesis we propose two methods which allow integration of domain specific knowledge to one of the most popular methods for low dimensional word representations – Brown clustering. First, we propose changing the order in which words are clustered so that words more relevant for the task are given priority. Then, we also propose modifying the clustering objective so that it ensures relevance of induced clusters for downstream supervised task. Experiments show that both methods improve performance of NER system using cluster features.application/pdfen-USCC BYbrown clusteringconstrained clusteringnersemi-supervised learningword representationsLinguisticsComputer scienceArtificial intelligenceLinguisticsUsing External Knowledge to Improve Brown ClusteringThesis