Sampling Weights, Model Misspecification and Informative Sampling: A Simulation Study
Linear mixed-effects (LME) models analyze data that contain complex patterns of variability, specifically involving different nested layers. While LME models can match well the stratification and clustering of survey data, it is not clear how sampling weights should be incorporated into LME estimates. This report uses twelve simulation studies to compare two published methods of inserting sampling weights into LME estimates, Pfeffermann, et al. (1998), denoted PSHGR, and Rabe-Hesketh and Skrondal (2006), denoted RHS. There are five main conclusions based on these simulations. 1) The PSHGR and RHS point estimates are very similar, with differences due to numerical instabilities in the estimation procedures. 2) Confidence intervals based on the sandwich estimator and the design based estimator of the variances provide similar coverage when there is no model misspecification. However, when there is model misspecification, the design-based variance estimator has unexpectedly large coverage, implying that the variance estimates are too large. 3) When there is model misspecification that does not induce informative sampling, weighted estimates do not reduce bias of the estimators. 4) When there is informative sampling, the weighted estimators do reduce the bias of the point estimates, though they do not eliminate it. 5) The unweighted estimate has the smallest variance. When there is informative sampling, the unweighted estimates are biased. The weighted unscaled estimate corrects the bias in the fixed effects, but produces more bias in the random effects. The scaled 1 weightings remove the bias in the fixed effects, and overcorrect for the weighted unscaled bias in the random effects. The scaled 2 weightings remove the bias in the fixed effects and are in between the weighted unscaled and weighted scaled 1 bias in the random effects.