Hyper Markov Non-Parametric Processes for Mixture Modeling and Model Selection
Markov distributions describe multivariate data with conditional independence structures. Dawid and Lauritzen (1993) extended this idea to hyper Markov laws for prior distributions. A hyper Markov law is a distribution over Markov distributions whose marginals satisfy the same conditional independence constraints. These laws have been used for Gaussian mixtures (Escobar, 1994; Escobar and West, 1995) and contingency tables (Liu and Massam, 2006; Dobra and Massam, 2009). In this paper, we develop a family of non-parametric hyper Markov laws that we call hyper Dirichlet processes, combining the ideas of hyper Markov laws and non-parametric processes. Hyper Dirichlet processes are joint laws with Dirichlet process laws for particular marginals. We also describe a more general class of Dirichlet processes that are not hyper Markov, but still contain useful properties for describing graphical data. The graphical Dirichlet processes are simple Dirichlet processes with a hyper Markov base measure. This class allows an extremely straight-forward application of existing Dirichlet knowledge and technology to graphical settings. Given the wide-spread use of Dirichlet processes, there are many applications of this framework waiting to be explored. One broad class of applications, known as Dirichlet process mixtures, has been used for constructing mixture densities such that the underlying number of components may be determined by the data (Lo, 1984; Escobar, 1994; Escobar and West, 1995). I consider the use of the new graphical Dirichlet process in this setting, which imparts a conditional independence structure inside each component. In other words, given the component or cluster membership, the data exhibit the desired independence structure. We discuss two applications. Expanding on the work of Escobar and West (1995), we estimate a non-parametric mixture of Markov Gaussians using a Gibbs sampler. Secondly, we employ the Mode-Oriented Stochastic Search of Dobra and Massam (2009) for determining a suitable conditional independence model, focusing on contingency tables. In general, the mixing induced by a Dirichlet process does not drastically increase the complexity beyond that of a simpler Bayesian hierarchical models sans mixture components. We provide a specific representation for decomposable graphs with useful algorithms for local updates.