Unbiased Methods for Population-based Association Studies
Large, population-based samples and large-scale genotyping are being used to evaluate disease/gene associations. A substantial drawback to such samples is the fact that population substructure can induce spurious associations between genes and disease. We review two methods, called genomic control (GC) and structured association (SA), that obviate many of the concerns about population substructure by using the features of the genomes present in the sample to correct for stratification. The GC approach exploits the fact that population substructure generates `overdispersion' of statistics used to assess association. By testing multiple polymorphisms throughout the genome, only some of which are pertinent to the disease of interest, the degree of overdispersion generated by population substructure can be estimated and taken into account. The SA approach assumes that the sampled population, while heterogeneous, is composed of subpopulations that are themselves homogeneous. By using multiple polymorphisms throughout the genome, this `latent class method' estimates the probability sampled individuals derive from each of these latent subpopulations.
GC has the advantages of robustness, simplicity, and wide applicability, even to experimental designs such as DNA pooling. SA is a bit more complicated, but has the advantage of greater power in some realistic settings, such as admixed populations or when association varies widely across subpopulations. It, too, is widely applicable. Both also have weaknesses, as elaborated in our review.