Geographic Population Structure (GPS) of worldwide human populations infers biogeographical origin down to home village. E. Elhaik1, T. Tatarinova2,3, D. Chebotarev3, I. S. Piras4, C. M. Cal˛4, A. D. Montis5, M. Atzori5, M. Marini5, S. Tofanelli6, P. Francalacci7, L. Pagani8, C. Tyler-Smith8, Y. Xue8, G. Cucca4, T. G. Schurr9, J. B. Gaieski9, C. Melendez9, M. G. Vilar9, R. Gomez10, R. Fujita11, F. R. Santos12, D. Comas13, O. Balanovsky14,15, P. Zalloua16, H. Soodyall17, R. Pitchappan18, A. GaneshPrasad18, M. Hammer19, L. Matisoo-Smith20, S. R. Wells21 1) Department of Mental Health, Johns Hopkins University Bloomberg School of Public Health, 615 N. Wolfe Street, Baltimore, MD 21205; 2) Glamorgan Computational Biology Research Group, University of Glamorgan, Wales, CF371HR, United Kingdom; 3) Laboratory of Applied Pharmacokinetics and Genomics, Children's Hospital Los Angeles, University of Southern California, 4650 Sunset Blvd, Los Angeles, CA 90027; 4) Department of Sciences of Life and Environment , University of Cagliari, Monserrato, SS 554, 09042, Italy; 5) Research Laboratories, bcs Biotech S.r.l., Viale Monastir 112, 09122 Cagliari, Italy; 6) Department of Biology, University of Pisa, Via Ghini 13, 56126 Pisa, Italy; 7) Department of Science of Nature and Territory, University of Sassari, LocalitÓ Piandanna, Sassari, Italy; 8) The Wellcome Trust Sanger Institute, CB10 1SA, Hinxton, UK; 9) University of Pennsylvania, Philadelphia, PA; 10) CINVESTAV, Mexico City, Mexico; 11) University of San Martin de Porres, Lima, Peru; 12) Federal University of Minas Gerais (UFMG), Belo Horizonte, Brazil; 13) Institut de Biologia Evolutiva (CSIC-UPF), Universitat Pompeu Fabra, Barcelona, Spain; 14) Vavilov Institute for General Genetics, Moscow, Russia; 15) Research Centre for Medical Genetics, Moscow, Russia; 16) The Lebanese American University, Chouran, Beirut, Lebanon; 17) University of the Witwatersrand, Johannesburg, South Africa; 18) Chettinad Academy of Research and Education, Chennai, India; 19) University of Arizona, Tucson, AZ; 20) University of Otago, Dunedin , New Zealand; 21) National Geographic Society, Washington DC, USA.
The search for a method that utilizes biological information to predict humans place of origin has occupied scientists for millennia. Modern biogeography methods are accurate to 700 km in Europe but are highly inaccurate elsewhere, particularly in Southeast Asia and Oceania. The accuracy of these methods is bound by the choice of genotyping arrays, the size and quality of the reference dataset, and principal component (PC)-based algorithms. To overcome the first two obstacles, we designed GenoChip, a dedicated genotyping array for genetic anthropology with an unprecedented number of ~12,000 Y-chromosomal and ~3,300 mtDNA SNPs and over 130,000 autosomal and X-chromosomal SNPs carefully chosen to study ancestry without any known health, medical, or phenotypic relevance. We also 615 individuals from 54 worldwide populations collected as part of the Genographic Project and the 1000 Genomes Project. To overcome the last impediment, we developed an admixture-based Geographic Population Structure (GPS) method that infers the biogeography of worldwide individuals down to their village of origin. GPSs accuracy was demonstrated on three data sets: worldwide populations, Southeast Asians and Oceanians, and Sardinians (Italy) using 40,000-130,000 GenoChip markers. GPS correctly placed 80%; of worldwide individuals within their country of origin with an accuracy of 87%; for Asians and Oceanians. Applied to over 200 Sardinians villagers of both sexes, GPS placed a quarter of them within their villages and most of the remaining within 50 km of their villages, allowing us to identify the demographic processes that shaped the Sardinian society. These findings are significantly more accurate than PCA-based approaches. We further demonstrate two GPS applications in tracing the poorly understood biogeographical origin of the Druze and North American (CEU) populations. Our findings demonstrate the potential of the GenoChip array for genetic anthropology. Moreover, the accuracy and power of GPS underscore the promise of admixture-based methods to biogeography and has important ramifications for genetic ancestry testing, forensic and medical sciences, and genetic privacy.
You may contact the first author (during and after the meeting) at