Complete resequencing of extended genomic regions using fosmid targeting and PacBios Single Molecule Real-Time (SMRT) long-read sequencing technology. D. E. Geraghty1, C. W. Pyo1, K. Wang1, R. Wang1, Y. S. Pyon1, K. Eng2, B. Bowman2, S. Ranade2 1) Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA., United States of America; 2) Pacific Biosciences of California Inc., Menlo Park, CA, United States of America.

   A limitation of genome-wide methods for identifying genetic variation is the inability to acquire phased, extended, and complete genomic sequence from targeted regions. In that regard, we have substantially improved methods for both sample preparation and single-molecule, long-sequence-read generation that allow for complete, haplotype-resolved resequencing across extended genomic subregions. As a specific application, we have targeted subregions of HLA in case and control chromosomes with a major focus on Type 1 Diabetes (T1D). Despite decades of research, the causative genetic factors in the MHC that contribute to T1D susceptibility have not been completely and unambiguously identified and it is likely that some of the relevant genetic variants are yet to be discovered. Evidently, the most certain way to identify them is to resequence the conserved portions from cases and controls, with the goal of testing the relatively simple hypothesis that susceptibility loci lie within the MHC and specifically within the conserved extended haplotypes (CEHs) of the MHC that are associated with disease. Towards that end, we are completing a pilot project resequencing 800 kb segments that include the DR4 CEH using next-generation sequencing methods for both targeted DNA acquisition and for sequencing. Three of the four DR4 haplotypes resequenced are from T1D patients, and one from a control individual and all data are to be complete and phased over each 800 kb segment. The approach developed and data acquired demonstrate cost-effective linear scale-up, supporting feasibility of extending the analysis to several hundred cases and controls generating phased chromosomal genomic sequences of ~ 800 kb that encompass the extent of relevant CEHs.

You may contact the first author (during and after the meeting) at