Statistical estimation of haplotype sharing from unphased genotype data. D. Xifara1, I. Mathieson1, I. Tachmazidou2, G. Dedoussis3, L. Southam1,2, K. Panoutsopoulou2, K. Hatzikotoulas2, E. Zeggini2, G. McVean1 1) Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, Oxfordshire, United Kingdom; 2) Wellcome Trust Sanger Institute, Hinxton, UK; 3) Harokopio University Athens, Athens, Greece.
In large-scale population genomic data sets, individual chromosomes are likely to share extended regions of haplotype identity with others in the sample. Patterns of local haplotype sharing are informative about many processes including demography, selection and recombination. However, in outbred diploid populations, the identification of extended shared haplotypes is not straightforward, particularly in the presence of low levels of genotyping error. Here, we introduce a model-based method for detecting extended haplotype sharing between sets of individuals that provides accurate estimates from unphased genotype data. We also describe an implementation of the algorithm that can be applied to data sets consisting of thousands of samples. By applying the method to dense SNP data from 5,144 samples from the UK we show that the median extent of maximal haplotype sharing between unrelated samples is 1.7 cM, implying that even variants at frequencies of 1 in 10,000 within the UK are likely to be over 50 generations old (1,000 - 1,500 years). Moreover, we show that these data are consistent with a model in which explosive growth within the UK dates to 100 generations ago. In contrast, within a Greek population isolate (the MANOLIS cohort; part of the HELIC project) the median extent of maximal haplotype sharing within a sample of 754 unrelated individuals is 15 cM, implying approximately 4 generations (80-100 years) until the closest common-ancestor. By assessing the size and geographical distribution of maximal haplotype sharing within and between all cohorts of the HELIC project, we can characterise factors influencing local ancestry and begin to date connections between populations.
You may contact the first author (during and after the meeting) at