Regression and causation
journal contributionposted on 1994-01-01, 00:00 authored by Clark N. Glymour
Abstract: "In both linear and nonlinear multiple regression, when regressors are correlated the existence of an unmeasured common cause of regressor X[subscript i] and outcome variable Y may be bias estimates of the influence of other regressors, X[subscript k]; variables havingno influence on Y whatsoever may thereby be given significant regression coefficients. The bias may be quite large. Simulation studies show that standard regression model specification procedures make the same error. The strategy of regressing on a larger set of variables and checking stability may compound rather than remedy the problem. A similar difficulty in the estimation of the influence of other regressorsarises if some X[subscript i] is an effect rather than a cause of Y. The problem appears endemic in uses of multiple regression on uncontrolled variables, and unless somehow corrected appears to invalidate many scientific uses of regression methods. We describe an implementation in the TETRAD II program of a model specification algorithm that avoids these and certain other errors in large samples. We recommend that such an algorithm be applied before regression is used to estimate influence."