Rare variant sharing reveals population histories. I. Mathieson1, G. McVean1,2 1) Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom; 2) Department of Statistics, University of Oxford, Oxford, United Kingdom.

   Sharing of rare variants is highly informative about recent ancestry at a specific locus, and the rarer the variant the more recent the implied ancestry. Genome-wide patterns of rare allele sharing at different frequencies therefore provide a rich description of the shared ancestry of a sample. We show how to use this data to make inference about human history, and compare it to other approaches using identity by descent (IBD) sharing. First we describe a practical method for extracting and using rare variant sharing information. Using a combination of sequence and array genotype data we find shared haplotypes within a sample, and then use the joint distribution of the length of these haplotypes and the number of mutations they carry to infer the shared history of the sample in terms of the distribution of coalescence times. With this approach, we detect much smaller shared haplotypes than IBD-based methods (on the order of hundreds of kb rather than Mb), and thus infer history up to thousands of generations in the past, rather than hundreds, which allows us to investigate relatedness among humans at a worldwide scale. As an illustration, we applied this method to the Phase 1 data release of the 1000 Genomes Project, identified over 3 million shared haplotypes, and fully characterised the distribution of coalescent times between populations. For example, the median age of a haplotype shared between two GBR individuals (GBR-GBR) is 123 generations, and less than 1% of such haplotypes are older than 1000 generations. In contrast, the median age of a GBR-YRI haplotype is 744 generations and 41% are older than 1000 generations. Admixed populations like ASW show distributions of coalescent times consistent with mixtures of the distributions of the admixing populations. Finally, we compare our patterns of haplotype sharing to those generated using IBD sharing, and explain both the practical and conceptual differences between these approaches and the implications for inference.

You may contact the first author (during and after the meeting) at