Discovery of Genes Responsible for Neurocognitive Disease by Large Scale Integration of Sequence and Copy Number Data. B. P. Coe1, K. T. Witherspoon1, C. Baker1, B. O'Roak1, J. Schuurs-Hoeijmakers2, J. Shendure1, B. deVries2, J. Gecz3, M. Fichera4, C. Romano5, L. G. Shaffer6, J. A. Rosenfeld7, E. E. Eichler1 1) Department of Genome Sciences, University of Washington, Seattle, WA; 2) Department of Human Genetics, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands; 3) School of Paediatrics and Reproductive Health, The University of Adelaide, Adelaide, Australia; 4) Laboratory of Genetic Diagnosis, I.R.C.C.S. Associazione Oasi Maria Santissima, Troina, Italy; 5) Unit of Pediatrics and Medical Genetics, I.R.C.C.S. Associazione Oasi Maria Santissima, Troina, Italy; 6) Paw Print Genetics, Spokane, WA; 7) Signature Genomics, Spokane, WA.

   Copy number variants (CNVs) and sequence variants, including indels and single nucleotide variants (SNVs), have been associated with a variety of neurocognitive disorders; however these events are typically individually rare and thus require very large populations to identify with case-control significance. Large scale CNV screens of patients and controls allow sensitivity in identifying pathogenic events, but the large size of typical pathogenic CNVs results in the identification of multiple candidate genes per locus. In contrast, sequence variation is gene specific, and typically restricted to smaller study populations. Here we combined large scale CNV analysis with targeted sequencing of high priority candidates to enhance sensitivity and specificity of gene discovery. We compared the CNV landscape of 29,206 children referred to diagnostic labs with developmental delay and intellectual disability (DD/ID) to 19,584 healthy controls. This identified 66 regions including 31 novel regions that show an excess of large deletions or duplications in cases when compared to controls. This large scale case-control approach has yielded precise estimates of clinical significance for pathogenic copy number variants, as well as the identification of new loci and case-control significance for rare previously described loci such as 15q24, 3q29 and 2q11.2. We next targeted 36 genes which had been highlighted by large CNVs and de novo variants in studies of DD/ID and autism cohorts with molecular inversion probes. These genes were then sequenced in 3,249 additional cases of ID/DD and autism, and 2,600 controls, with follow up in parents when gene disruptive events were identified. The integration of CNV and sequence data has allowed us to specifically identify several genes including statistical enrichment of both CNVs and loss of function mutations in ZMYND11 in the 10p15.3 deletion syndrome, as well as statistical enrichment of loss of function mutations in additional genes including SETBP1 and ARID1B. In conclusion, this combined approach has allowed for the rapid discovery of potentially new syndromes and genetic causes of neurocognitive disease.

You may contact the first author (during and after the meeting) at