posted on 2006-01-01, 00:00authored byRoni Rosenfeld, Larry Wasserman, Can Cai, Xiaojin Zhu
Whole sentence exponential language models directly model the probability of an entire sentence using arbitrary computable properties of that sentence. We present an interactive methodology for feature induction, and demonstrate it in the simple but common case of a trigram baseline, focusing on features that capture the linguistic notion of semantic coherence. We then show how parametric regression can be used in this setup to efficiently estimate the model's parameters, whereas non-parametric regression can be used to construct more powerful exponential models from the raw features.