Assessing functional potential along the human genome by integrating comparative, population, and functional genomic data. I. Gronau1, B. Gulko2, L. Arbiza1, M. J. Hubisz1, A. Siepel1,2 1) Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY; 2) Graduate Field of Computer Science, Cornell University, Ithaca, NY.

   More than a decade after publication of the human genome sequence, little is still known about the effects of mutations in non-coding DNA. While genome-wide functional assays such as ChIP-seq, DNase-seq and RNA-seq are informative about possible biochemical functions of non-coding DNA, they provide little indication as to the importance of these functions to fitness at the organismal level. As a result, there is still much debate about the fraction of the human genome that is functional in the sense that mutations to these sequences influence fitness. Here, we aim to address this question using a novel approach that combines functional genomic data from a large collection of assays together with comparative genomic data and genome-wide human variation data. Using these different data sources, we assign each nucleotide in the genome a functional potential score (FPS), which represents the probability that this nucleotide is directly influenced by natural selection.
   FPSs are computed in a two steps: first, functional genomic data are used to group together sites with similar functional indicators (e.g., RNA transcription, open chromatin and histone modifications); then, using patterns of sequence variation and divergence from closely related primate species, the fraction of sites under selection is estimated separately for each group and used as the FPS of all sites in that group. This estimation is carried out using a recently developed method called INSIGHT (Gronau et al., MBE, 2013), which is specifically designed to detect signatures of recent natural selection across dispersed collections of sites. We use this approach to compute genome-wide FPSs and display them in a genome browser track.
   We find that focusing on signatures of recent natural selection provides our FPSs with a clear advantage over existing evolutionary conservation scores when used to classify putative functional non-coding elements such as enhancers, eQTLs and GWAS SNPs. We also examine the contribution of the different functional indicators to the FPS and find complex non-linear interactions between them. Finally, using a weighted average of FPS along the genome, we estimate that roughly 8% of the human genome is under selection. In conclusion, we anticipate that functional potential scores will provide a powerful tool in the ongoing efforts to characterize function in non coding regions of the human genome.

You may contact the first author (during and after the meeting) at