Accuracy of Ancestry Informative Markers (AIMs) for the estimation of individual ancestry in admixed populations. J. M. Galanter1, C. Gignoux1, M. Aldrich1, D. Torgerson1, J. G. Ford2, S. Nazarui2, J. R. Rodriguez-Santana3, J. Casal2, A. Torres-Palacios2, J. Salas4, R. Chapela4, H. Geoffrey Watson5, K. Meade6, M. LeNoir7, W. Rodriguez-Cintrón3, P. C. Avila8, A. Bigham9, M. Shriver9, E. González Burchard1,10 1) Department of Medicine, University of California, San Francisco, San Francisco, CA; 2) Veterans Caribbean Health Care System 10 Casia Street San Juan, PR 00921; 3) Centro de Neumologia Pediátrica Torre Medica Auxilio Mutuo Suite 215 Ave. Ponce de Léon No. 735 San Juan, Puerto Rico 00917; 4) Instituto Nacional de Enfermedades Respiratorias Mexico City, Mexico; 5) James A. Watson Wellness Center 5709 Market St Oakland, CA 94608; 6) Children's Hospital Oakland Research Institute 5700 Martin Luther King Jr Way Oakland, California 94609; 7) Bay Area Pediatrics Ste 1, 2940 Summit Street Oakland, CA 94609-3410; 8) Division of Allergy/Immunology Northwestern University M-316, McGaw Pavilion, 240 E. Huron, Chicago, IL 60611; 9) Department of Anthropology Penn State University 512 Carpenter Building State College, PA; 10) Department of Biopharmaceutical Sciences University of California, San Francisco, San Francisco, CA.
Introduction Ancestry informative markers (AIMs) have been used as a cost-effective way to estimate individual ancestral proportions in admixed populations such as African Americans and Latinos. We determined the accuracy of individual ancestry estimates derived from smaller AIMs panels compared to ancestry estimates using all genomewide data as the gold standard. Methods Latino participants of Mexican (n = 271) and Puerto Rican (n = 324) originwith asthma were recruited from the San Francisco Bay Area, New York City, Puerto Rico, and Mexico City. Genotyping was performed using the Affymetrix 6.0 GeneChip Array; after applying standard QC filters 729,685 markers remained for analysis. We used the intersection of Illumina-550 and Affymetrix 6.0 as our set of potential SNPs to encourage universal applicability of our marker panels. Ancestry information for each SNP was measured via pairwise In calculations. AIMs panels of 18, 36, 75, 150, 300, 600, 1200, and 2400 unlinked markers were selected. Individual ancestry was estimated using the program ADMIXTURE, specifying a three population model. Ancestral populations consisted of HapMap Yorubans and CEPH Europeans, as well as Maya and Nahua Native Americans. We compared differences in ancestry estimated with different size AIMs panels with ancestry estimated from genomewide markers. Mean and standard deviation of the difference in ancestry estimation between AIMs and genomewide data were calculated. Results There was an inverse correlation between the number of AIMs used to estimate ancestry and mean and standard deviation of the error in ancestry estimation. Using AIMs, African ancestry was consistently overestimated, while the major ancestral component (European in Puerto Ricans and Native American in Mexicans) was systematically underestimated. Using 300 or fewer AIMS consistently produced a standard deviation of ancestry estimation error of 10% or greater. Discussion Our results illustrate significant error in the estimation of individual ancestry using AIMs. There is both systematic bias resulting in overestimation of African ancestry (and underestimation of other continental ancestry) and random error. Such error is inversely proportional to the number of AIMs used. These findings may have implications for genetic association studies where ancestry is used to control for population stratification as well as for studies examining associations of individual ancestry estimates with a phenotype.