An integrated nexus of >12,000 genome sequences and analysis tools facilitates novel gene discovery. J. Reid1, A. Carroll2, N. Veeraraghavan1, C. Gonzaga-Jauregui3, A. Morrison4, T. Gambin3, A. Sundquist2, M. Bainbridge1, M. Dahdouli1, Z. Huang1, A. Li4, F. Yu1, R. Daly2, J. Lupski3, G. Duyk2, R. Gibbs1,3, E. Boerwinkle1,4 1) Human Genome Sequencing Ctr, Baylor College Med, Houston, TX; 2) DNAnexus, Mountain View, CA; 3) Dept of Molecular and Human Genetics, Baylor College Med, Houston, TX; 4) Human Genetics Ctr, Univ of Texas Health Science Ctr at Houston, Houston, TX.

   Discovery of the genetic causes of human disease is a first step toward personalized medicine, predictive diagnostics, and drug development. Accomplishing this goal requires genome sequence from an informative sample of patients with the same diagnosis, a large sample of healthy individuals to serve as a comparison or filtering group, and a suite of informatics tools to take the data from raw sequence to study results. Creating such a nexus of data and analysis tools requires considerable resources, which may be inefficient to repeatedly recreate in a multitude of academic or clinical laboratories.
   To address the analysis requirements of both the high-throughput research environment and rapidly growing clinical sequencing efforts, we have developed the Mercury data processing pipeline -- an automated, flexible, and extensible analysis workflow designed to provide accuracy and reproducibility to a sequencing pipeline. Using Mercury, we have created a resource of 2,000 patients and 10,000 healthy individuals with exome and whole genome sequence data and a suite of easy-to-use analysis tools for the purpose of promoting biomedical research, particularly novel gene discovery. By leveraging cloud computing technologies via the DNAnexus platform, this collaborative resource is scalable, extensible, and compliant with clinical security standards (including ISO 27001 certification). Inter-operability, data standards, and an intuitive interface facilitate efficient data and tool sharing.
   We have used such a resource for discovery of multiple disease genes for Mendelian disorders and complex disease risk factors. Three use cases will be presented in detail. In the first case, a diagnosis was made in a patient with Bohring-Opitz-like syndrome that was only possible because of having a large database of similarly affected individuals and an informative control/filtering set. In the second case, we were able to create a consortium of Charcot-Marie Tooth investigators to evaluate the spectrum of phenotypic expression for patients having the same or similar disease mutations. In the last case, we demonstrate that detailed performance metrics available in the cloud make these data resources an ideal platform for comparing and benchmarking analysis tools.

You may contact the first author (during and after the meeting) at