Rare variant association studies: what population genetics models teach us about power and study design. BM. Neale1,2, O. Zuk1,3, E. Hechter1,4, K. Samocha1,2, MJ. Daly1,2, S. Sunyaev1,5, S. Schaffner1, E. Lander1,6 1) Broad Institute of MIT and Harvard, Cambridge MA; 2) Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA; 3) Toyota Technological Institute, Chicago, IL; 4) Department of Mathematics, UC Berkeley, CA; 5) Division of Genetics, Brigham and Women's Hospital, Boston, MA; 6) Department of Biology, Massachusetts Institute of Technology, Cambridge MA.

   Recent advances in sequencing and genotyping enable the assessment of rare variation in humans at unprecedented levels. In particular, Rare Variant Association Studies (RVAS) aim to shed light on the role of rare variants on common diseases. The allele frequency spectrum and impact of rare alleles depend on demographic history. To explore the influence of population demography on the nature of functional genetic variation in genes, we conducted a series of forward simulations and compared the results of these simulations to exome sequencing data of over 20,000 individuals. We demonstrate that population demography does not heavily influence the average number of deleterious mutations per individual. However, bottlenecks such as those that occurred in Finland and Iceland increase the variation in the number of deleterious mutations per gene. This increased variance has clear implications for gene identification and highlights the value of populations that have experienced a recent bottleneck for genetic investigation. We next study the effect of demography on our ability to detect alleles in RVAS. RVAS differ from GWAS as we need to aggregate rare variants for association tests. Rare variants form a heterogeneous group, with different effect sizes, selection coefficients and allele frequencies. We introduce a two-class mixture model for coding rare variants, where variants are assumed to be either completely harmless, or completely essential for protein function. Using this model, we contrast the power to detect association for rare coding variation under a scenario of perfect information, restricting the analysis to causal mutations vs. an imperfect information scenario, where the set of functional mutations is unknown or predicted at different levels of accuracy. These comparisons aim to understand how best to handle missense mutations in contrast to putative loss of function (LoF) variants, as LoFs are predicted to have a more consistent impact on phenotype. In conclusion, the search for rare functional variants is likely to more challenging for common complex traits than was the case for common variant identification. However, study designs of more extreme selection and the use of populations with more recent bottlenecks can dramatically improve the power to detect significant association. These results have clear implications for the design and analysis of rare variant studies and will inform the next round of genetic investigations.

You may contact the first author (during and after the meeting) at