Large-scale profiling of sequence variation affecting transcription factor occupancy in vivo. M. T. Maurano, E. Haugen, R. Sandstrom, J. Vierstra, J. A. Stamatoyannopoulos Dept. of Genome Sciences, Univ. of Washington, Seattle, WA, USA.
Genome-wide association studies have identified thousands of disease- and trait-associated variants that systematically localize in non-coding regulatory elements. The mechanistic investigation of sequence variation in these elements has been impeded by the difficulty of experimentally modelling the highly cell-type selective regulatory activity and complex physical organization of native loci. To identify functional variation affecting regulatory elements in their native locus configuration and cellular environment, we analyzed allelically resolved genomic DNaseI footprinting and ChIP-seq data to identify variants with consistent effect across 121 cell and tissue types. We report the functional classification of 359,477 regulatory variants, of which 66,957 demonstrate significant imbalance in chromatin accessibility in vivo. Discovery of functional variation depends strongly on sequencing depth, which can be efficiently augmented using a targeted approach. In contrast to the characteristic cell-type selectivity of the chromatin landscape, we find that sequence variants affect occupancy across multiple cellular contexts. We show that functional variation delineates characteristic sensitivity profiles for several hundred transcription factor motifs representing 56 families of non-redundant sequence specificities, and including many of the key factors linked to the establishment of accessible chromatin. Nevertheless, silent variants are found repeatedly at every position within the protein-DNA recognition interface, and the majority of variation is buffered in a site-dependent manner in vivo. We account for these local context effects by developing TF-specific profiles of functional variation, and demonstrate their utility for the functional classification of novel regulatory variation by identifying 438,097 variants in dbSNP strongly predicted to affect binding. In summary, our characterization of regulatory variation affecting TF activity provides a foundation for the etiological investigation of non-coding genetic associations.