Patterns of linkage disequilibrium reveal genotyping errors and copy number polymorphisms. P. Scheet, M. Stephens. Dept Statistics, Univ Washington, Seattle, WA.
While average genotyping accuracy for large-scale studies of population genetic variation is now very accurate, a small number of individual SNPs may exhibit unusually high error rates, either due to pathological patterns in the intensities on which genotype calls are based, or because the SNP lies within a copy number polymorphism (CNP). Simple filters, particularly a test for Hardy-Weinberg equilibrium (HWE), are usually applied to attempt to identify any suspicious SNPs. Here we introduce a new filter based on identifying SNPs whose genotypes produce unusual patterns of linkage disequilibrium. By using a flexible model for variation in a sample of unphased multilocus genotypes, we scan the observed genotype data, evaluating the potential for a marker position to have at least one error by constructing a likelihood ratio (LR) statistic for each SNP in the sample. We also calculate the expected number of errors at each SNP among all sampled individuals. We test these criteria for error identification and quantification on two independent data sets. We applied the method to unfiltered data of unrelated CEU individuals from the HapMap Project, computing LRs at SNPs which showed a signature of genotype error and comparing these to the number of Mendelian incompatibilities (MI) obtained from trio information. Sites with multiple MIs show a different distribution for the criteria than do SNPs with 0 or 1 MI. Inspection of raw genotype data confirmed the presence of errors or a CNP at the 1 relevant SNP for which we had access to the intensities. Additionally, we scanned data from the early phase of an association study, consisting of 192 people typed at 105,000 SNPs genome-wide. We were able to confirm the presence of errors and CNPs at suspicious SNPs from the genotype intensities. Although these data are sparse compared with forthcoming studies, our method identified SNPs which would have been overlooked by deviations from HWE alone. For the more dense HapMap data, our method offers considerable improvement over deviations from HWE. These methods have been incorporated into fastPHASE and will be available in a future release of the software.