Databases, genome repositories, and clinical applications to interpret personal genome for precision and preventative therapies. R. Chen Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY.

   With the advance of next generation sequencing technologies, we can sequence a personal genome in a few days under one thousand dollars. Hundreds of thousands of individuals have been sequenced and released into public repositories, with millions of genomes coming in the next few years. There is an urgent need to build automated systems to interpret personal genome for clinical diagnosis, precision medicine, and preventative therapies. We have built an automated system APOLLO to process next generation sequencing data and interpret personal genome. First, we curated and analyzed hundreds of thousands of human genomes, exomes, and genotyping data and built a central variant store, which contains over 110 million distinct genetic variants with unique IDs across studies. Second, we used Hadoop and MapReduce to calculate the frequencies of these variants across hundreds of disease states and ethnic populations. Third, we built hundreds of annotation databases using text mining, manual curation, and public repositories. Fourth, we used these primary databases to annotate 110 million variants and built a secondary annotation database called ActiVar. Last, we developed a tool called VARA to integrate variant calls from multiple sequencing platforms and variant calling algorithms to identify reliable and actionable variants. We further developed a series of clinical applications to interpret personal genome and exome for precision and preventative therapies. For each cancer patient with solid tumor, we sequenced the DNA and RNA from the tumor and blood, built a decision tree for each FDA approved drug by integrating variants, fusions, CNV, and RNA, and created a clinical report to recommend personalized precision medicine, clinical trials, and immunotherapy vaccines. To search for preventative therapies, we launched the resilience project to search for Unexpected Heroes: healthy individuals with resilience to deleterious mutations commonly leading to severe childhood diseases. We curated 674 founder or recurrent disease causing variants with extremely high penetrance from 162 genes for 125 distinct Mendelian disorders, screened 596,610 personal genome, exome, and genotyping data, filtered with Sanger confirmation and clinical review, and identified 9 final unexpected heroes. In summary, the explosion of big data has enabled clinical applications to interpret personal genome for clinical diagnosis, precision medicine, and preventative therapies.

You may contact the first author (during and after the meeting) at