A joint testing framework uncovers paradoxical SNPs, improves power, and identifies new sources of missing heritability in association studies. B. C. Brown1, N. A. Patsopoulos2, A. Price3,4, L. Pachter1,6,7, N. Zaitlen5 1) Computer Science Department, UC Berkeley, Berkeley, CA; 2) Department of Neurology, Brigham & Women's Hospital, Harvard Medical School Cambridge, MA; 3) Department of Epidemiology, Harvard University Cambridge, MA; 4) Department of Biostatistics, Harvard University Cambridge, MA; 5) Department of Medicine, UCSF San Francisco, CA; 6) Department of Mathematics, UC Berkeley Berkeley, CA; 7) Department of Molecular and Cell Biology, UC Berkeley Berkeley, CA.
Variants identified via GWAS of complex human phenotypes account for only a small fraction of the total heritability. While there are many proposed explanations for this missing heritability (Manolio et al Nat. 2009), an overlooked issue is that of linkage masking (LM), in which linkage disequilibrium between SNPs masks their signal under gold standard marginal tests of association, preventing their discovery in GWAS. Previous examinations of this phenomenon have focused only on known associated loci (Wood et al HMG 2011). In this work, we show that (1) LM is an instance of a Simpsons paradox, where an effect is visible in subgroups but not in the population as a whole, (2) mixed-model based estimates of h2,g include their signal although GWAS may never find them, (3) intelligent joint testing without an interaction term will improve power in the presence of proximal causal variants, including masked SNPs, without a substantial increase in multiple testing burden.
Joint testing of SNPs has been under-utilized due to the immense computation time required and the large multiple testing penalty. We avoid these issues by using a sliding window approach wherein we perform joint tests only on markers with squared correlation exceeding a threshold R. We also provide a method for estimating the null distribution 500x faster than a permutation test, making application computationally efficient. We detail, via extensive simulation, the power gain/loss under different disease models, window sizes, R, and LD patterns. We find significant power gains when multiple causal variants are proximal, reaching as high as 32.1% in the case of LM. The increase in multiple hypothesis testing penalty is relatively minor for reasonable window sizes and R, preventing severe power loss when causal variants are distant. We applied our method to three WTCCC data sets (RA, CD, T1D) with a window size of 100 SNPs and R=0, discovering 47% more loci from the NHGRI database over the marginal test (22 vs 15 loci). For example, in RA rs2104286 has p-value 7e-06, but this drops to 3e-09 when jointly tested with rs1570527, revealing the later-discovered association at 10p15.1. Additionally, we find all classically discovered loci, lending further evidence to recent work suggesting most loci harbor multiple variants (Gusev et al PG 2013). In all, our framework provides evidence that joint testing can improve power and uncover sources of missing heritability.
You may contact the first author (during and after the meeting) at