Leveraging genetic variation from over 55,000 exomes to explore patterns of functional constraint on human protein-coding genes. K. Samocha1,2,3, M. Lek1,2, D. MacArthur1,2, M. Daly1,2,3, Exome Aggregation Consortium 1) Massachusetts General Hospital, Boston, MA; 2) Broad Institute of Harvard and MIT, Cambridge, MA; 3) Harvard Medical School, Boston, MA.

   A critical challenge in human disease genetics is distinguishing disease-causing variants from the thousands of rare, potentially functional variants identified in any human genome. While many methods focus on predicting the deleteriousness of an individual variant, a complementary approach to improve the power for causal variant discovery is to focus on variants found in genes that typically show unusually low levels of variation in healthy individuals; these genes must be subject to a high level of functional constraint, increasing the probability that novel variants observed in them will have a deleterious phenotypic impact. The medical relevance of such genes has already been established (see, for example, Epi4K-Consortium 2013).
   We extended earlier work and used a model of mutation to predict the expected amount of rare (minor allele frequency 0.001) variants for each gene in a cohort of over 55,000 reference individuals jointly called as a part of the Exome Aggregation Consortium (see abstract by M. Lek et al). This model accurately predicts the number of observed synonymous (and putatively neutral) variants per gene (Pearsons correlation = 0.94). It can also be used to define a metric of constraint for both missense and loss-of-function (LoF) variation. With over 55,000 individuals, this model has unprecedented power to confidently identify genes that are depleted for LoF variants, and to provide direct estimates of the human-specific selective constraint for each gene. The magnitude of empirical variation data in this analysis enables several powerful analyses for the first time. Genome-wide estimates have suggested 20-30% of missense variants may be equivalent to LoF variants. Here we evaluate this on a gene-by-gene basis and find strikingly that genes with equivalent strong selection against LoF variation show deficits of missense variation suggesting a wide range - from close to 0% to more than 50% - with which missense variants show equivalent deleterious impact. This provides a critical and heretofore missing parameter in estimating the pathogenic probability for a novel missense variant. Missense constraint can also be used to highlight specific coding regions within each gene that are intolerant of mutational changes. We explored various approaches to evaluate missense constraint for segments of genes, and aligned de novo variants from patients with autism to these locations for further confirmation of disease-relevance.

You may contact the first author (during and after the meeting) at