Widespread exonic transcription factor binding directs codon usage and protein evolution. A. B. Stergachis1, E. Haugen1, A. Shafer1, W. Fu1, B. Vernot1, J. M. Akey1, J. A. Stamatoyannopoulos1,2 1) Genome Sciences, University of Washington, Seattle, WA; 2) Department of Medicine, University of Washington, Seattle, WA.

   The human genome contains two codes that have long been assumed to operate independently of one another -- the genetic code that specifies the sequence of amino acids in a protein, and a regulatory code that specifies recognition sites for 1,000 sequence-specific transcription factors (TFs) that collectively control gene expression. We used genomic DNaseI footprinting to systematically map transcription factor occupancy at nucleotide resolution across the human exome in 81 diverse cell types. Here we show that 14% of codons in human exons simultaneously specify both amino acids and regulatory information in the form of transcription factor recognition sites. Such dual-use codons (duons) are highly evolutionarily conserved, and exhibit systematic constraint of both degenerate and non-degenerate codon positions that is directly attributable to overlying binding of a sequence-specific TF. This constraint has widely impacted codon choice, and acts as the major driver of codon usage biases in the human genome. Duons have also widely impacted protein evolution by constraining possible nonsynonymous changes. We show further that the genetic code has reciprocally affected the regulatory code, which is selectively depleted of recognition sites with the potential to recognize (and therefore ectopically introduce) stop codons. Finally, we show that at least 17% of human coding variants (including synonymous, nonsynonymous, and disease-associated variants) that lie within duons directly impact overlying TF binding. In summary, our results show that transcription factors have systematically shaped human codon choice and protein evolution, and that interpretation of genetic variation within coding sequence must account for overlying regulatory codes.

You may contact the first author (during and after the meeting) at