Mixed model association methods: advantages and pitfalls. A. Price1, J. Yang2, N. A. Zaitlen3, M. E. Goddard4, P. M. Visscher2 1) Harvard Sch Pub Hlth, Boston, MA; 2) University of Queensland, Brisbane, Australia; 3) University of California, San Francisco, CA; 4) University of Melbourne, Melbourne, Australia.
It is widely known that mixed linear model association (MLMA) methods can prevent false-positive associations due to population or relatedness structure, and increase power by applying a correction that is specific to this structure. Here, we present new results including theoretical derivations, simulations and application to empirical data to highlight several advantages and pitfalls of MLMA. We provide an analytical derivation for the loss in power of MLMA with the candidate marker included in the genetic relationship matrix (MLMi) vs. linear regression. We also provide an analytical derivation for the increase in power of MLMA with the candidate marker excluded from the genetic relationship matrix (MLMe) vs. linear regression. In large data sets, MLMe will have average chi-square statistics > 1 (which is appropriate, due to polygenic effects) whereas MLMi will have average chi-square statistics = 1 (which is not appropriate and leads to a loss in power). Next, we investigate the previously proposed approach of including only a subset of top associated markers in the genetic relationship matrix. We show that this approach can increase power for some genetic architectures, but can suffer an insufficient correction for false-positives in the case of subtle population structure. Finally, we consider ascertained case-control traits, and show that MLMA methods (including MLMe) suffer a loss in power as a function of the sample size and level of case-control ascertainment (which depends on disease prevalence, in studies with an equal number of cases and controls). The above results were validated via extensive simulations (involving both simulated and real genotypes) and application to empirical WTCCC2 data sets with multiple sclerosis and ulcerative colitis phenotypes spanning 20,000 samples. We observed large difference in test statistics for different MLMA approaches, including 20% higher test statistics (P-value 10-20) for MLMe vs. MLMi at 99 published markers known to be associated to MS and UC, consistent with our analytical derivations and demonstrating the large impact of the choices we describe. Software implementing the MLMe approach is available at http://www.complextraitgenomics.com/software/gcta/mlmassoc.html.