Completion of The 1000 Genomes Project: Results, Lessons Learned and Open Questions. G. Abecasis, The 1000 Genomes Project Ctr Statistical Gen, Univ Michigan, SPH I, Ann Arbor, MI.
Starting in 2008, the 1000 Genomes Project (1000GP) set out to use next generation sequencing technologies to generate a catalog of human genetic variation and haplotypes. This publically available catalog now includes haplotypes for >2,500 individuals from 26 populations and >80 million genetic variants, ranging from SNPs, indels and other small variants, to insertions of mobile elements and other material, to large structural variants spanning 100s of kilobases. We summarize challenges, opportunities and technical and methodological advances encountered during the course of the project. In addition, we summarize insights about human genetic variation and the utility of project results for genetic association studies. Throughout the project, we have combined cost effective strategies for generating sequence data, public data release, thorough quality control of the resulting data, and integrated multiple methods for analysis to improve results. We have also developed software tools, methods and formats that are now in widespread use and allow sequence analysis and interpretation in a wide variety of contexts. Through advances in DNA sequencing technology and a combination of analytical approaches ranging from read mapping, to local reassembly, to full-scale de novo assembly of human genomes, our number of sequenced genomes has increased from ~180 in a first pilot analysis to >2,500 in our final release, and the proportion of each genome assessed with high confidence has increased from ~80% to ~96%. We assessed the accuracy and sensitivity of our results through comparisons with deep sequencing using Complete Genomics for 427 individuals and deep PCR-free Illumina data for 24 individuals as well as targeted long-read sequencing using PacBio. Our haplotype resource can aid genetic studies of disease across a variety of populations, provides insights about human demography and aids functional interpretation of the genome. Perhaps more importantly, the principles of open data sharing, collaboration, and friendly competition embodied by the project can be implemented in many future collaborations.
You may contact the first author (during and after the meeting) at