Quantifying and partitioning variation due to genetic effects and population stratification using within-family prediction analysis. J. Yang1,2 1) on behalf of the GIANT Consortium; 2) Queensland Brain Institute, The University of Queensland, Brisbane, Queensland, Australia.

   Genome-wide association studies (GWAS) of human height have employed large sample sizes (~130,000) and identified a large number of associated variants (180) (Lango Allan et al. 2010), but have nonetheless only accounted for ~10% of phenotypic variation, in contrast to the predicted ~45% of variance explained by all common SNPs (Yang et al. 2010). As such, expansion of GWAS of height could provide continued insights into the genetic architecture of this model human complex trait, with potential implications for studies of other complex traits and diseases. It has been observed for GWAS on height and other complex traits that the genomic inflation factor (GC) increases with increasing sample size, which is consistent with both population stratification and polygenic variation (Yang et al. 2011a). Here we report results from a meta-analysis of 79 GWASs, comprising ~250,000 individuals of European ancestry. We observe a large genome-wide inflation factor of the test statistic for association (GC=1.94) even after we corrected each studys test statistics by its individual inflation factor (Devlin and Roeder 1999). We developed a within-family prediction approach, which is able to quantify the variation due to real SNP effects, population stratification and errors in estimating SNP effects, by comparing the difference in SNP-based genetic predictor and phenotype between full sibs selected from independent families. The analyses with and without fitting principal components clearly show that our proposed approach is able to distinguish variance component due to true association signals from those due to stratification and estimation errors. We also show that variance attributable to population stratification is minor for SNPs that passed genome-wide significance. We confirmed the variance due to real SNPs effects, as inferred from the within-family prediction analysis, by the whole-genome estimation analyses as implemented in GCTA (Yang et al. 2011b). The results show that ~16% of phenotypic variance can be explained 697 genome-wide significant SNPs and that ~29% of variance is captured by the best ~9500 SNPs selected from a multiple SNPs association analysis (Yang et al. 2012). Together, these results suggest that the observed large genomic inflation is consistent with a genetic architecture for human height that is characterized by a very large but finite number of causal variants (thousands), spread out over the genome.