Building causal graphs from statistical data in the presence of latent variables
journal contributionposted on 1991-01-01, 00:00 authored by Peter Spirtes
Abstract: "The problem of inferring causal relations from statistical data in the absence of experiments arises repeatedly in many scientific disciplines, including sociology, economics, epidemiology, and psychology. In addition, the building of expert systems could be expeditedif background knowledge elicited from experts could be supplemented with automated techniques using relevant statistics. Recently, efficientalgorithms for determining causal relationships between random variables (in the form of Bayesian networks) from appropriate statistical data when there are no unmeasured or 'latent' variables have been discovered. (See Spirtes, Glymour and Schneines 1990, Spirtes and Glymour 1991, Verma and Pearl 1990, and Pearl and Verma 1991.)Inferring causal relations when unmeasured variables are also acting is a much more difficult problem. In many cases it is impossible to infer the structure among the latent variables from statistical relations among the measured variables. But the presence of latent variablescan also make it difficult to infer the causal relations among the measured variables themselves. When only two variables, A and B, have been measured, and there is a correlation between the two, this does not suffice to establish whether A causes B, B causes A, or there is a third unmeasured variable causing both A and B. Nevertheless, when other variables are measured, more knowledge about the causal relations between A and B is possible.We will prove in Theorem 2 that there are some circumstances in which it is possible to establish that A causes B, rather than that B causesA, or that a third unmeasured variable causes both A and B; and we will prove in Theorem 3 that there are other circumstances in which the possibility that A causes B can be eliminated."