Meta-imputation: a simple and flexible method to combine multiple reference panels for imputing genetic variants. P. K. Albers1, G. R. Abecasis4, M. I. McCarthy1, 2, 3, K. J. Gaulton1 1) Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK; 2) Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Churchill Hospital, Oxford, UK; 3) Oxford National Institute for Health Research Biomedical Research Centre, Churchill Hospital, Oxford, UK; 4) Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, USA.
Methods for genotype imputation have become integral tools for genetic association studies, in which a reference panel is used to infer genetic variants that were not typed in a genotype sample. Although large panels are available in the public domain, e.g. HapMap or 1000 Genomes, recent large-scale sequencing studies have made it possible to use a wider variety of sequence data for imputation. To exploit this ever-increasing abundance of information, it is desirable to combine available reference sets, as this is likely to increase the chance of imputing lower frequency and rare variants more accurately. However, constructing a combined panel from data produced in independent sequencing studies is difficult, as data would ideally be called together to produce a single reference containing all haplotypes. Practically, either the subset of variants that are present in all sets can be considered, which reduces their total number, or missing variants have to be cross-imputed into each set, which decreases statistical certainty in downstream imputation. Also, by adding more haplotypes and variants to a reference set, a considerable higher computational burden is imposed on imputation. Here, an alternative solution is proposed; meta-imputation is a simple and flexible approach that integrates multiple reference panels without interfering in the imputation algorithm. After separately imputing different references into a genotype sample, inferred genotype likelihoods are combined at overlapping sites, using imputation-dependent certainty scores in a weighting function. For evaluation of our meta-imputation approach, we subdivided 1000 Genomes data into split reference panels, emulating the situation when several reference sets are to be combined, and compared meta-imputation of split references with imputation of the full reference. We assessed the results in a comprehensive cross-validation procedure, repeatedly taking out a random genotype for comparison with the corresponding imputed and meta-imputed variant. We show that meta-imputation compares well to imputation in terms of accuracy as measured by R2, as well as allele and rare variant error rates. Notably, our approach is useful to reduce computation time, because it can impute references separately in parallel. The prime benefit, however, is that meta-imputation allows to make an informed choice as to which sequence-based data to include as reference sets in the overall imputation process.
You may contact the first author (during and after the meeting) at