Creating a Single Haplotype Human Genome Assembly. T. Graves1, W. Warren1, B. Fulton1, K. Meltz Steinberg1, R. Agarwala2, V. Schneider2, D. Church2, E. Eichler3, R. Wilson1 1) Washington University School of Medicine, Saint Louis, MO; 2) National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD; 3) University of Washington, Genome Sciences, Seattle, WA.
The human genome reference sequence has provided a foundation for studies of genome structure, human variation, evolutionary biology and human disease. Many of these studies have also revealed, however, that there are regions of the human reference genome that are not represented optimally. At the time the reference genome was completed it was clear that there were some loci recalcitrant to closure with the technology and resources available at that time. It was not clear, however, the degree to which structural variation and diversity affected our ability to produce a truly representative genome sequence at these loci. Many of these regions in the genome, particularly the structural variant loci, are often associated with repetitive sequences. In order to discriminate between repeat copy and allelic copies, the sequence from a single haplotype across these regions is necessary. To this end, we have utilized a hydatidiform mole source, CHM1 to finish highly complex, repetitive regions to high quality. Our aim is to develop a single allelic representation of the entire human genome, the platinum reference. In order to achieve this, we have generated ~100X whole genome shotgun sequence as Illumina paired end data, as well as over 450 BAC sequences from the CHM1 libraries. The whole genome data has been assembled using a reference-guided assembly and the finished BAC sequences have been incorporated into this assembly. We have compared the CHM1 assembly to the current reference, GRCh37 to identify single nucleotide variants, structural variants, and missing sequence from the reference. In addition, we have aligned the CHM1 Illumina sequence to the CHM1 assembly to evaluate the efficacy of our assembly strategy.
You may contact the first author (during and after the meeting) at