Weakly-Supervised Bayesian Learning of a CCG Supertagger
We present a Bayesian formulation for weakly-supervised learning of a Combinatory Categorial Grammar (CCG) supertagger with an HMM. We assume supervision in the form of a tag dictionary, and our prior encourages the use of crosslinguistically common category structures as well as transitions between tags that can combine locally according to CCG’s combinators. Our prior is theoretically appealing since it is motivated by languageindependent, universal properties of the CCG formalism. Empirically, we show that it yields substantial improvements over previous work that used similar biases to initialize an EM-based learner. Additional gains are obtained by further shaping the prior with corpus-specific information that is extracted automatically from raw text and a tag dictionary