Large-scale parent-child trio sequencing highlights factors influencing spontaneous human mutation. S. Sunyaev1, P. Polak1, L. Francioli2, W. Kloosterman2, P. I. W. de Bakker2, Genome of Netherlands (GoNL) Consortium 1) Brigham & Women's Hospital, Harvard Medical School, Boston, MA; 2) University of Utrecht Medical Center.
Characterization of context-dependency and regional variation of human mutation rate is important for studies focusing on the role of de novo mutations in Mendelian and complex phenotypes. Study design based on identifying genes with recurrent de novo mutations associated with a phenotype requires an accurate model of local mutation rate. Further, quantitative description of context-dependency and regional variation of human mutation rate is a key step towards understanding biology of spontaneous mutagenesis. It is also an important ingredient in models of genome evolution and evolution of genetic disease. Current knowledge of properties of human mutation rate primarily comes from indirect inference from comparative genomics and population genetic variation data. Whole genome sequencing of multiple parent-child trios enables direct characterization of properties of spontaneous mutagensis in humans. The Genome of the Netherlands (GoNL) is an effort to characterize genomic variation in the Dutch population through whole-genome sequencing of 250 families (231 trios, 19 twin quartets) at 12x using Illumina HiSeq. Sequencing was performed by BGI (China). We developed a Bayesian algorithm to detect de novo mutations from pedigree data and implemented it as the PhaseByTransmission module in the Genome Analysis Toolkit (GATK). In total, we called more than 19,600 de novo mutations, of which 44% were called with high confidence. 92% of the high-confidence calls were independently validated. We used the resulting set of high confidence de novo mutations to characterize context dependency and regional variation of human mutation rate. Our analysis was assisted by newly developed statistical methods to correct for false-positive and false-negative mutation calls. We observed excellent agreement of context-dependent point mutation rates with earlier predictions from comparative genomics. We also detected a striking strand asymmetry in transcribed regions with the rate of A>G transitions elevated by 40% in the non-transcribed strand compared to the transcribed strand, suggesting the impact of transcription-coupled repair on human germ-line mutagenesis. Finally, we quantified regional variation in mutation rate and specifically addressed influence of epigenetic variables such as replication timing and chromatin architecture. This analysis generated hypotheses on specific factors shaping the landscape of human mutation.
You may contact the first author (during and after the meeting) at