Integrative annotation of variants from 1,092 humans: application to cancer genomics. E. Khurana, M. Gerstein, Functional Interpretation Group of the 1000 Genomes Consortium Yale University, New Haven, CT.

   Plummeting sequencing costs have led to a great increase in the number of personal genomes. Interpreting the large number of variants in them, particularly in non-coding regions, is a central challenge for genomics. We investigate patterns of selection in DNA elements from the ENCODE project using the full spectrum of sequence variants from 1,092 individuals in the 1000 Genomes Project Phase 1, including single-nucleotide variants (SNVs), short insertions and deletions (indels) and structural variants (SVs). We analyze both coding and non-coding regions, with the former corroborating the latter. We identify a specific sub-group of non-coding categories that exhibit very strong selection constraint, comparable to coding genes: ultra-sensitive regions. We also find variants that are disruptive due to mechanistic effects on transcription-factor binding (i.e. "motif-breakers"). Using connectivity information between elements from protein-protein interaction and regulatory networks, we find that variants in regions with higher network centrality tend to be deleterious. Indels and SVs follow a similar pattern as SNVs, with some notable exceptions (e.g. certain deletions and enhancers). Using these results, we develop a scheme and a practical tool to prioritize non-coding variants based on their potential deleterious impact. As a proof of principle, we experimentally validate and characterize a small number of candidate variants prioritized by the tool. Application of the tool to ~90 cancer genomes (breast, prostate and medulloblastoma) reveals ~100 candidate non-coding cancer drivers. This approach can be readily used in precision medicine to prioritize variants.

You may contact the first author (during and after the meeting) at