Using ontologies to enhance integration and analyses of ENCODE data. V. S. Malladi1, J. S. Strattan1, D. T. Erickson1, E. T. Chan1, E. L. Hong1, G. Barber2, G. Binkley1, J. Garcia2, B. C. Hitz1, D. Karolchik2, K. Learned2, B. Lee2, S. Miyasato1, G. Moro2, G. R. Roe1, K. Rosenbloom2, L. D. Rowe1, N. R. Podduturi1, M. Simison1, C. A. Sloan1, E. Weiler2, W. J. Kent2, J. M. Cherry1 1) Stanford University, Department of Genetics, 300 Pasteur Dr., Stanford, CA, 94305; 2) Center for Biomolecular Science and Engineering, University of California, Santa Cruz, CA, 95064.
The Encyclopedia of DNA Elements (ENCODE), Roadmap Epigenomics (REMC), and modENCODE projects, are large collaborative efforts that aim to provide public resources for the scientific community. The goal of ENCODE, now in its 8th year, is to create a comprehensive catalog of functional elements in the human and mouse genomes. The modENCODE project shares this goal but focuses its investigation on the model organisms C. elegans and D. melanogaster, creating a comparative resource to provide insight into human processes. REMC shares similar assays and tissues to ENCODE while investigating the human epigenome. Though each project has distinct scientific goals, these projects complement the data generated by the ENCODE project.
The ENCODE Data Coordination Center (DCC), which collects all data and metadata generated by the ENCODE project, is currently integrating metadata from modENCODE and REMC. To further enhance the analysis that can be performed within and across these three projects, the DCC has made use of ontologies to annotate these metadata. The DCC has used The Ontology for Biomedical Investigations (OBI;http://obi-ontology.org) to facilitate data identification from various assays sharing similar biological objectives to be searched (e.g. epigenetic modification), allowing an investigator to retrieve all data across the three projects matching various assays (e.g. RRBS, MRE-seq). Data identification across shared anatomy, morphology, and development are accomplished by using and developing cross-references between Cell Type Ontology (http://cellontology.org/) and UBERON (http://uberon.org/). Researchers querying for a biological system (e.g. digestive system) will retrieve data generated from tissues and cells that comprise that system (e.g. colon and epithelial cell of stomach). Here we present our implementation of ontologies to integrate these three projects and how it may be used to identify experiments that match the interests of a researcher. Data from the ENCODE project can be accessed via the ENCODE portal (http://www.encodeproject.org) and the UCSC Genome Browser (http://genome.ucsc.edu/cgi-bin/hgGateway).
You may contact the first author (during and after the meeting) at