Free the Data: EmBase and EmVClass Facilitate Storage, Interpretation, Curation, and Sharing of Over 11,000 Sequence Variants Identified Through Clinical Testing. L. J. H. Bean, S. W. Tinker, C. da Silva, M. R. Hedge Human Genetics, Emory University, Decatur, GA.
Current technology allows clinical laboratories to generate large amounts of sequence data from single genes, gene panels or whole exomes through clinical testing. It is critical that clinical laboratories recognize the importance of the data they hold and share this data with the medical community. To better manage and share our data, Emory Genetics Laboratory (EGL) developed the two components of our data management suite: EmBase and EmVClass. EmBase is EGLs highly-curated clinical grade sequence variant database, maintained at the gene level, the variant level, and the patient level to manage internal workflow and reporting processes. The EmBase data structure is designed to facilitate open sharing of variants identified in samples tested at EGL. To date, EmBase contains over 11,000 variants classified as either pathogenic (n=2670), likely pathogenic (n=89), variant of unknown significance (n=3740), likely benign (n=24), or benign (n=4632). Of these variant classifications, over half (n=5982) have been reviewed and validated since the launch of EmBase in July, 2012. The remaining variant classifications (n=5181) were assigned between 2005 and July, 2012. Importantly, this system tracks changes in variant classifications. Also documented in EmBase are other reportable variants (e.g. pseudodeficiency alleles; n=10). The EmBase data structure was designed to easily transfer data to an electronic medical record or publically available database. To date over 5,300 variants with validated classifications were submitted to the NCBI ClinVar database with an approximately 99% successful submission rate. Because efforts such as the ClinVar project are still in the earliest phase, we developed EmVClass, web-based tool that allows any user access to variants seen at EGL and their current classification. A review of classification for a particular variant can be requested through a simple request form. To date, EmVClass has received over 7000 searches from over 700 distinct users in 50 countries and 41 US states. In addition, data from EmVClass have been absorbed into locus specific databases, such as the Leiden Muscular Dystrophy Pages (dmd.nl). The ease with which EGL sequence variant data can be browsed, searched, and transferred to other databases underscores the need to invest in bioinformatics personnel and infrastructure so that large numbers of curated sequence variants can be stored in a highly structured environment.
You may contact the first author (during and after the meeting) at