Linguistically Motivated Combinatory Categorial Grammar Induction
MetadataShow full item record
Combinatory Categorial Grammar (CCG) is a widely studied grammar formalism that has been used in a variety of NLP applications, e.g., semantic parsing, and machine translation. One key challenge in building effective CCG parsers is a lack of labeled training data, which is expensive to produce manually. Instead, researchers have developed automated approaches for inducing the grammars. These algorithms learn lexical entries that define the syntax and semantics of individual words, and probabilistic models that rank the set of possible parses for each sentence. Various types of universal or language specific prior knowledge and supervising signals can be exploited to prune the grammar search space and constrain parameter estimation. In this thesis, we introduce new methods for inducing linguistically motivated grammars that generalize well from small amounts of labeled training data. We first present a CCG grammar induction scheme for semantic parsing, where the grammar is restricted by modeling a wide range of linguistic constructions, then introduce a new lexical generalization model that abstracts over systematic morphological, syntactic, and semantic variations in languages. Finally, we describe a weakly supervised approach for inducing broad scale CCG syntactic structures for multiple languages. Such approaches would have the greatest utility for low-resource languages, as well as domains where it is prohibitively expensive to gather sufficient amounts of training data.