Integrated model of multiple types of rare variants and prior information improves the power of detecting risk genes for autism. X. He1, S. J. Sanders2, L. Liu3, S. D. De Rubeis4,5, E. T. Lim6,7, J. S. Sutcliffe8, G. D. Schellenberg9, R. A. Gibbs10, M. J. Daly6,7, J. B. Buxbaum4,5,11,12, M. W. State2, B. Devlin13, K. Roeder1,3 1) Lane Center of Computational Biology, Carnegie Mellon University, Pittsburgh, PA; 2) Departments of Psychiatry and Genetics, Yale University School of Medicine, New Haven, CT 06520, USA; 3) Department of Statistics, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA; 4) Seaver Autism Center for Research and Treatment, Mount Sinai School of Medicine, New York, New York 10029, USA; 5) Department of Psychiatry, Mount Sinai School of Medicine, New York, New York 10029, USA; 6) Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts 02114, USA; 7) Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA; 8) Vanderbilt Brain Institute, Departments of Molecular Physiology & Biophysics and Psychiatry, Vanderbilt University, Nashville, Tennessee, 37232, USA; 9) Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104; 10) Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA; 11) Department of Genetics and Genomic Sciences, Mount Sinai School of Medicine, New York, New York 10029; 12) Friedman Brain Institute, Mount Sinai School of Medicine, New York, New York 10029, USA; 13) Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania 15213, USA.
While whole exome sequencing (WES) greatly facilitates the study of rare genetic variation, it is widely believed that very large sample sizes are required to identify risk genes for complex disease from such data. In contrast, a surprising degree of progress has been made for early-onset disorders, like autism spectrum disorders (ASD), by identifying recurrent de novo mutations in moderate sized samples. In this work we propose statistical strategies to build on these promising results by using multiple types of data in a unified Bayesian framework. With our initial efforts, we develop methods that can incorporate WES data regarding de novo mutations, inherited rare variants, and rare variants identified from cases and controls. TADA, for Transmitted And De novo Association, integrates these data by a gene-based likelihood model involving parameters for mutation rates, allele frequencies and gene-specific penetrances. Inference is based on an Hierarchical Bayes strategy that borrows information across all genes to improve parameter estimation. We validated TADA using realistic simulations mimicking rare, large-effect mutations affecting risk for ASD and show it has much better power than other common methods of analysis. Thus TADAs integration of various kinds of WES data can be an effective means of identifying novel risk genes. Indeed by applying TADA to all published WES data from subjects with ASD and their families, as well as from a case-control study of ASD, we identified several novel and promising ASD candidate genes with strong statistical support. Moreover, based on published comparisons of the rate of de novo mutations in ASD probands versus their siblings, it has been conjectured that half of the 116 genes that sustained exactly one severe de novo mutation in probands are ASD risk genes. TADA successfully identifies approximately half of these genes as promising candidates deserving further investigation. We are pursuing several refinements of the TADA framework, in particular we investigate how to use the prior information of rare variants to improve TADA. We are studying three types of external information: effects of mutations on protein function; the allele frequencies of variants in large independent samples; and the selective constraints on variants in the population. Our simulations and analysis on a collection of known ASD risk genes demonstrate that such information boosts the power of association studies.
You may contact the first author (during and after the meeting) at