Tracing individual ancestry in a principal components space. C. Wang1, L. Liang1, G. Abecasis2, X. Lin1 1) Biostatistics, Harvard School of Public Health, Boston, MA; 2) Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI.

   Joint analysis of combined data from multiple sources is an effective approach to increase sample size and statistical power in genetic association studies. Such analysis requires accurate estimation of individual ancestry to adjust for potential population stratification among the combined data. We have previously developed a method that can use small amounts of sequence data to accurately place the ancestry of individuals in a principal components ancestry map generated using a reference set of individuals. Here, we modify the method to also analyze directly genotyped samples. This allows us to place targeted sequenced samples and array-genotyped samples into the same reference ancestry map, facilitating analysis of combined data from targeted sequencing and array-genotyping experiments. We apply these methods to estimate worldwide continental ancestry and fine-scale ancestry within Europe using the Human Genome Diversity Project (HGDP) and Population Reference Panel (POPRES) as the reference panels. Our results show that ~1,000 random SNPs can lead to accurate estimation of continental ancestry while ~20,000 random SNPs are required for accurate estimation of ancestry within Europe. Further, we examine two custom arrays, the ExomeChip and MetaboChip. We show that for samples genotyped on these two arrays, we can accurately estimate their continental ancestry using the HGDP data as reference. However, estimating the fine-scale ancestry within Europe is difficult, partly due to the small number of overlapping markers between these two arrays and the POPRES data. To address this problem, future studies should consider developing a densely genotyped reference panel of diverse European populations.

You may contact the first author (during and after the meeting) at