Globus Genomics: Enabling high-throughput analysis and management of NGS data for neurodevelopmental disorders. D. Sulakhe1, A. Paciorkowski3, G. Mirzaa2, R. Madduri1, Q. Zhang2, K. Aldinger2, J. Bennett2, L. Lacinski1, P. Dave1, W. Dobyns2 1) Computation Institute, University of Chicago, Chicago, IL; 2) Center for Integrative Brain Research, Seattle Children's Research Institute, University of Washington, Seattle, WA; 3) Center for Neural Development and Disease, University of Rochester Medical Center, Rochester, NY.

   The availability of low-cost high-throughput sequencing methods in the form of next-generation sequencing is revolutionizing translational research. However, handling such large volumes of sequencing datasets and their analyses introduce great challenges including secure and reliable data transfers, availability of scalable computational resources for analysis, and defining reproducible and reusable analytic workflows. To address these challenges, we have developed a translational research platform called Globus Genomics. Globus Genomics provides an end-to-end solution by integrating state-of-the-art technologies and infrastructures such as Globus Online for data-transfer, Galaxy for workflow management, and AWS for on-demand computational resources. Globus Genomics has established data endpoints at various widely used sequencing centers including the Broad Institute and Perkin-Elmer, to enable electronic transfer of data directly into or between research labs for immediate analysis or collaborations. The platform hosts an enhanced Galaxy instance with hundreds (539 tools) of widely used next-gen sequence analysis tools and many pre-defined best practices pipelines for whole exome, RNA-Seq, or Chip-seq data analysis. Unlimited scalability, enabling analysis of numerous exomes simultaneously, is possible due to the platforms ability to provision on-demand compute clusters on Amazon and submit workflows to that cluster from Galaxy. Globus Genomics allows dynamic tool specific provisioning of Amazon EC2 nodes, thus accommodating a wide range of CPU and memory intensive analytical tools requiring varying compute capabilities that helps in dramatically reducing execution times. This platform has been used successfully at the Dobyns laboratory at the University of Washington to transfer hundreds of exomes amounting to tens of terabytes from Perkin-Elmer sequencing center and local servers to Amazon AWS, allowing data to be ready for analysis a few hours after sequence generation. The use of Globus Genomics successfully cut back on data transfer time from a few weeks to a few hours. Furthermore, the platform has provided a 20X performance improvement in the upstream analysis by allowing the simultaneous analysis of 20 exomes in parallel. The Globus Genomics platform offers a powerful and efficient tool for the transfer and analysis of nextgen data for clinical and research purposes.

You may contact the first author (during and after the meeting) at