Reference sample guided pooled sequencing identifies loss-of-function patterns across human populations. A. Eran1, M. Carneiro1, G. del Angel1, E. Banks1, R. Poplin1, M. Lek1,2, G. van der Auwera1, S. Fisher1, S. Gabriel1, D. Altshuler1, D. MacArthur1,2, M. Depristo1, The 1000 Genomes Project Consortium 1) Broad Institute of Harvard and MIT, Cambridge, MA; 2) Massachusetts General Hospital, Boston, MA.
Originally deemed disease causing, about 100 loss-of-function (LoF) mutations may be detected in every typical human genome sequenced today. Understanding the typical LoF variation distribution is therefore essential for accurate genomic variation inference, especially in a clinical setting. Here we present an accurate cost-effective targeted sequencing framework, based on reference sample guided pooled variant calling, and apply it to confidently survey LoF variation in ~1000 individuals from 13 distinct populations. About 13,000 putative LoF variants originally detected in the 1000 Genomes Project were targeted, pooled along a reference sample, deeply sequenced, and jointly genotyped using an empirically derived site-specific error model. When compared to microrarrays and The 1000 Genomes Project genotypes, deep targeted reference sample guided pooled sequencing showed concordance of over 99% with SNPs and 94% with indels. Using this approach, we examined population-specific LoF burden at the single gene and pathway levels. We find that protein-coding genes enriched with LoF variation in any single population were frequently involved in environmental response (p<1e-8) and agreed with known demographic histories. These tend to act extracellularly or at the plasma membrane, and include disease-implicated loci such as ITGA7, TMEM67, AIRE, and HLA-DQB1. Molecular mechanisms subject to differential gene inactivation between populations include natural killer cell mediated cytotoxicity in Africans (p<1e-8), transition metal ion binding in Europeans (p<1e-8), and adaptive immunity in East Asians (p<1e-8). These results improve our understanding of LoF variation across human populations, and illustrate the value of reference sample guided pooled sequencing for large-scale population studies.
You may contact the first author (during and after the meeting) at