Epigenome imputation leads to higher-quality datasets and helps improve GWAS interpretation. J. Ernst1, A. K. Sarkar2,3, L. D. Ward2,3, M. Kellis2,3 1) UCLA, Los Angeles, CA; 2) MIT, Cambridge, MA; 3) Broad Institute, Cambridge, MA.
Genotype imputation has become commonplace to predict unobserved genetic variants by leveraging the increasing availability of large reference panels. The field of epigenomics is now undergoing a similar transition, with thousands of reference epigenomes becoming available, presenting an analogous opportunity to exploit their highly correlated nature for prediction of unobserved epigenomic datasets, and to generate more robust versions of existing datasets. Here, we introduce epigenome imputation, and apply it to predict 4,315 high-resolution genome-wide signal maps, consisting of 31 histone marks, DNaseI, DNA methylation, and RNA-Seq across 127 tissue/cell types. Imputed signal tracks show strong concordance with observed signal, and surpass observed datasets in effective sequencing coverage, consistency, and correspondence with relevant gene annotations, even for tissue-restricted genes. Global discrepancy between observed and imputed data reveals low-quality experiments, while local discrepancies in high-quality datasets in some cases reveal locations of tissue-specific regulation. We also use imputed datsets to generate the most comprehensive prediction of chromatin state information to date, consisting of 25 chromatin states based on 12 imputed marks across 127 epigenomes. Imputed epigenomic data has important implications for interpreting genome-wide association studies. Across 108 traits, we find that chromatin states learned using imputed data significantly improve the power to detect functional enrichments of trait-associated loci in characterized active enhancer regions, increasing the number of significant cell type-trait pairs by approximately 30%. They also show improve enrichments for variants that are weakly-associated (below genome-side signficance): for Type 1 Diabetes for example, we find increased enrichment in enhancer regions, better distinction of disease-relevant cell types and regions, and reduced enrichment for spurious cell types and regions. We expect that our method, software implementation, and imputed datasets will be a valuable community resource and that epigenome imputation will become a widely-adopted complement to large-scale experimental mapping of epigenomic information.
You may contact the first author (during and after the meeting) at