Interpreting eQTLs by linking enhancers to target genes. J. Wang1,2, A. Kundaje1,2,3,4, L. D. Ward1,2, M. Kellis1,2, GTEx Consortium and Roadmap Epigenomics Program 1) Computer Science Dept, Massachusetts Institute of Technology, Cambridge, MA 02139; 2) Broad Institute of MIT and Harvard, Cambridge, MA 02139; 3) Computer Science Dept., Stanford University, Stanford, CA, 94305; 4) Dept. of Genetics, Stanford University, Stanford, CA, 94305.
It is a challenge to interpret the downstream effects of genetic variants located in non-coding regulatory regions, because the target genes of those regulatory elements may not be the most proximal gene and the regulatory relations can be cell-type specific. Thus, a more complete and accurate linking map between distal regulatory elements and their specific target genes is necessary to better understand the eQTL relations. Enhancers represent an important family of distal regulatory elements. By exploring the dynamics of enhancer activities, along with cell-type specific gene expression patterns, we can statistically link enhancers to their target genes, which provide a basis to understand eQTLs. We have developed a generative model to probabilistically assign enhancers and genes into modules, and estimating the linking probabilities between enhancers and genes jointly. Applying the model on the histone modification and gene expression datasets of 26 cell-types from NIH Roadmap Epigenomics, we discovered 21 enhancer modules, 12 gene modules and enhancer-gene linking probabilities from this dataset. As performance validation, the cell-type specific linking is compared with ChIA-PET datasets and show significant overlaps in matched cell-types. Furthermore, the predicted linking relations are verified by checking whether they can accurately quantify gene expressions based on enhancer activities. We observe clear improvements compared to the results of correlation-based methods. The genes that are most affected by linked enhancers are enriched in cell-type specific pathways. We compared the predicted linking between enhancers and genes to the eQTL linking between SNPs and gene expressions from GTEx project. We observe a number of overlapped linking relations which provide direct interpretations to those eQTLs. Interestingly, among those overlapped linking relations, we observe examples of enhancers linked to the genes which are not the most proximal. For example, six enhancers linked to CD52 overlapped with eQTLs from whole blood cells. For two of those enhancers, CD52 is not the most proximal gene, supporting the value of long distance enhancer linking. Similar overlapped long distance linking examples are also observed for CD48, CD37 and LCK, which are all related to immune functions. We believe that, by integrating our predicted enhancer-gene linking structure, people can better interpret and prioritize eQTLs involved in long distance regulations.
You may contact the first author (during and after the meeting) at