Identification and characterization of enhancer and target gene pairs in mammalian genomes. Y.-C. Hwang1, C.-F. Lin2,3, O. Valladares2,3, J. Malamon2,3, Q. Zheng4,5, B. Gregory1,4,5, L.-S. Wang1,2,3,5 1) Genomics and Computational Biology Graduate Group, University of Pennsylvania Perelman School of Medicine; 2) Institute for Biomedical Informatics, University of Pennsylvania Perelman School of Medicine; 3) Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine; 4) Department of Biology, University of Pennsylvania, Philadelphia, PA; 5) Penn Genome Frontiers Institute, University of Pennsylvania, Philadelphia, PA.

   Genome-wide association studies have shown the majority of disease- and trait-associated genetic variations lie within non-coding regions of the human genome. It has been hypothesized that many of these variants may affect non-coding regulatory elements. One class of the elements is enhancer elements, which regulate gene expression through long-range interactions with the promoters of protein-coding loci. However, the interactions between enhancer elements and their target genes can be linearly distal and orientation-independent, and probing all possible enhancer-target gene pairs in the genome is laborious and remains largely unsolved. To identify all enhancers and the genes they regulate, we reanalyzed Hi-C datasets of human cells (hESC, IMR90, GM06996, K562) and mouse cells (mESC and cortex). We first extracted restriction fragments (intervals between two adjacent restriction sites) with significantly higher Hi-C read counts than expected. These restriction fragments are referred to as Hi-C peaks. The Hi-C peaks are identified as enhancer elements with the following criteria: they have to (1) pair with a coding gene promoter region; (2) overlap with sites having known enhancer-associated histone modifications (H3K27ac, H3K4me1, etc); and (3) reside in DNase I hypersensitive sites. Using this analysis pipeline, we have identified between 2,540 and 13,867 enhancer-target gene pairs for human and mouse genomes. As expected, enhancers are more conserved and highly enriched with p300 binding activity, while their target promoters are ~20% more likely to be in RNA polymerase II binding sites and cell-type-specific. We found enhancers can act pleiotropically by regulating more than one gene while there is also redundancy in enhancer-target gene pairs to provide precise gene regulation. We also found that ~90% of the pairs are intra-chromosomal and the majority of the interactions are within 1Mbp of each other. By down sampling Hi-C reads, we found increased read coverage allows improved detection of longer distance interactions. These results suggest that long-range interactions are relatively transient in the cell. This comprehensive enhancer-target gene catalog will allow us to identify disease-linked polymorphisms that lie within enhancers, as well as their regulated genes as candidate disease genes. By comparing the enhancer-gene pairs between human and mouse embryonic stem cells, we can study the evolution of enhancer-mediated regulatory mechanisms.

You may contact the first author (during and after the meeting) at