Autism ten thousand genomes (AUT10K) project: a roadmap for the complete genetic landscape of autism spectrum disorder. S. W. Scherer1,2, R. K. C. Yuen1, H. Cao3, X. Tong3, D. Cao3, Y. Sun3, M. Li3, W. Chen3, X. Jin3,4,5, J. L. Howe1, C. R. Marshall6, P. Szatmari7, D. Merico1, R. H. Ring8 1) The Centre for Applied Genome, Peter Gilgan Centre for Research and Learning, Toronto, Ontario, Canada; 2) McLaughlin Centre, University of Toronto, Toronto, Ontario, Canada; 3) BGI-Shenzhen, Bei Shan Road, Yantian, Shenzhen, China; 4) BGI@CHOP, Childrens Hospital of Philadelphia, Philadelphia, USA; 5) School of Bioscience and Bioengineering, South China University of Technology, Guangzhou, China; 6) Molecular Genetics, Department of Paediatric Laboratory Medicine, The Hospital for Sick Children, Toronto, Canada; 7) Centre for Addiction and Mental Health, University of Toronto, Toronto, Canada; 8) Autism Speaks, New York, USA.

   Autism spectrum disorder (ASD) is a collection of neurodevelopmental conditions characterized by deficits of social interaction, communication and present of restricted and repetitive behaviors. The U.S. Centers for Disease Control and Prevention has recently reported that 1 in 68 children are diagnosed with ASD, making it one of the most common childhood disorders in the United States and worldwide. Over the past few years, large-scale genome-wide analyses (microarray and sequencing) have unveiled the important roles of de novo and rare inherited mutations in the etiology of ASD, which promises to enable early diagnosis and intervention. Initiated in 2012, the AUT10K project aims to establish the largest repository of ASD genomic sequence data by providing comprehensive whole genome sequence (WGS) and phenotype information of 10,000 individuals and families with ASD. Our pilot study performing WGS of 32 trios (ASD child and parents) showed that clinically-relevant genetic variants were found in ~50% of families. In a second stage, we have finished WGS of 200 ASD simplex trios with a depth of ~30X per genome using the Illumina HiSeq technology. Applying our newly developed variant detection pipeline, we found an average of 62.9 de novo single nucleotide variant (SNVs) and 19 de novo insertion/deletions (indels), data largely consistent with our previous findings. SCN2A remains to be the only gene with loss-of-function (LoF) mutations found in more than one family in this cohort. De novo LoF mutations were also detected in other known ASD-risk genes with high GC content (often difficult to assay in exome sequencing), such as SHANK2 and SHANK3. De novo CNV detection remains to be a challenge to extract from WGS using existing tools, but we were able to find 8 putative de novo CNVs in coding region of the genome. We will present the advantages of WGS for resolving sequence variants residing in complex regions of the genome, leading to an improved clinical detection rate for ASD. We will also discuss our strategies on analyzing mutations beyond LoF mutations and coding regions, and share our experience with other aspects of the project such as big data storage and management.

You may contact the first author (during and after the meeting) at