Utilizing rare variants for phasing and imputation in pedigrees. A. Blackburn, J. Blangero, H. Göring Department of Genetics, Texas Biomedical Research Institute, San Antonio, TX.
Whole genome sequencing of multigenerational pedigrees is poised to be the new paradigm of statistical genetics studies in the post-GWAS era. Efficient methods to phase and impute whole genome sequencing data between sequenced and non-sequenced pedigree members are essential to realizing the potential of this paradigm. Methods to phase whole genome sequencing data in extended pedigrees will enable haplotype specific genetic analyses, improve identification of compound heterozygosity, improve genotype error checking, and can be used to reduce the multiple testing burden inherent to the whole genome sequencing approach by applying segment based tests. Current imputation methods in related individuals are limited by pedigree size, by the distance of relationships, or by computation time. Here we explore the potential to utilize rare variants to identify DNA segments shared between pedigree members that are identical by descent using simulated and real whole genome sequencing data. We apply this information to phase and impute whole genome sequencing genotypes, and to reduce multiple testing burden. To fully explore the robustness of this approach across pedigree structures, we randomly simulated pedigree structures ranging from small nuclear families to large pedigrees consisting of up to ~2500 individuals from 10 generations. Whole genome sequencing data was then simulated for these pedigrees based on Kimuras infinite alleles model under the assumption of neutrality. Additionally, real whole genome sequencing data was used to simulate founder genotypes and transmission was simulated through the pedigree. Imputation accuracy was estimated for diallelic variants across levels of available sequencing data and sequencing accuracy by masking genotypes and using the IQS statistic to correct for random concordance of genotypes. Imputation accuracy varies based on levels of available data and pedigree size, while being generally robust to low genotyping error rates. We conclude that rare variants, especially those that are specific to a single founder, are of increased utility compared to common variants for the purpose of phasing and imputation of whole genome sequencing genotypes in multigenerational pedigrees.
You may contact the first author (during and after the meeting) at