REDUCTION OF GENOMIC COMPLEXITY FOR RE-SEQUENCING BY REGION-SPECIFIC EXTRACTION. J. Dapprich1, D. Ferriola1, M. Kunkel1, A. Gabriel2, M. Dunham3 1) Generation Biotech, Lawrenceville, NJ; 2) Rutgers University, Piscataway, NJ; 3) Princeton University, Princeton, NJ.

   Structural variation can have significant influence on the accuracy of SNP typing, sequencing and haplotype analysis. Interpretation of typing results can be affected by the underlying genomic context. Molecular analysis to determine the positions of non-fixed or copy number variable elements throughout the genome can be difficult or impossible by sequence analysis alone. Further, the assembly of short, random shot-gun sequencing reads within the context of genomic structural variation has become an acute problem for next-generation sequencing due to the presence of repetitive regions in complex genomes. Region-specific extraction (RSE) is an automated method that reduces complexity of genomic DNA by physically isolating targeted genomic elements, including flanking sequences. A coupled enzymatic hybridization and tagging process achieves single-base specificity and high capture efficiency of genomic regions. RSE is able to resolve sequence ambiguities caused by missing cis-trans linkage, copy number variation or mobile genetic elements. Here we demonstrate the selective separation and analysis of a highly homologous, duplicated gene region called MICA/MICB, located in the Major Histocompatibility Complex. This region is implicated in numerous autoimmune and other diseases such as diabetes. RSE probes were used to selectively extract the duplication containing the MICA gene from the duplication containing the MICB gene using sequence variation between the two copies. A similar approach was used to resolve homologous gene cassettes in the killer immunoglobulin-like receptor region on chromosome 19 and map the location and copy number of mobile genomic elements in yeast on DNA microarrays. RSE is directly compatible with essentially any typing method and can be carried out in a 96-well format on commercially available systems. This provides a sample preparation tool that can deconvolute complex genomic regions in a high-throughput mode by combining the flexibility of current whole genome analysis methods with the more informative content of site-directed screening methods.