A logistic mixed model approach to obtain a reduced model score for KBAC to adjust for population structure and relatedness between samples. G. Linse Peterson1, J. Grover1, B. Vilhjalmsson2, G. Christensen1, A. Scherer1 1) Golden Helix, Inc, Bozeman, MT; 2) Harvard School of Public Health, Cambridge, MA.

   Accounting for population structure, family structure, and inbreeding is a significant issue for burden and kernel association tests on rare variants from next generation DNA sequencing. Recent approaches such as the VC-score test and the methods outlined by Schaid have adjusted burden and kernel tests using linear regression mixed models and correcting for the population structure using a kinship matrix as a random effects matrix. However, these methods do not readily extend to a logistic regression framework; the method we present uses mixed-model logistic regression directly on a binary dependent variable to account for population structure and cryptic relatedness. We have implemented a solution that combines the power of a mixed model regression analysis with the ability to assess the rare variant burden using KBAC (Kernel-Based Adaptive Cluster method). While several optimizations are available for linear mixed model regression on a genome-wide scale, it is non-trivial to efficiently solve a logistic mixed model regression for every gene. Therefore, we have derived a transformed linear pseudo-model to solve the logistic mixed model equation optimized using EMMA (Efficient Mixed Model Algorithm), and we pre-compute and reuse the permutations for KBAC and the reduced models for those permutations. The result is an efficient logistic mixed model regression algorithm with a kinship random effects matrix for computing a modified score test for KBAC (MM-KBAC). In addition, the method for computing the kinship matrix can affect the power of the method to identify the gene(s) associated with the complex trait(s). Comparisons will be made between various methods for specifying the kinship matrix including IBS, IBD, and a pedigree-based matrix using GAW17 simulated data and 1000 Genomes data. We show that including a random effects matrix to account for population structure using a logistic model directly with KBAC results in an increased power to detect significant results and controls for Type I error when compared with family adjusting methods such as famSKAT or VC-score and methods assuming independence of samples including KBAC and SKAT-O for binary traits.

You may contact the first author (during and after the meeting) at