SNAP: fast, accurate sequence alignment enabling biological applications. R. Pandya1, W. Bolosky1, M. Zaharia3, T. Sittler2,5, K. Curtis2, C. Hartl4, A. Fox2, S. Schenker2, I. Stoica2, D. Patterson2 1) eScience Research Group, Microsoft Research, Redmond, WA; 2) University of California, Berkeley, CA; 3) CSAIL, Massachusetts Institute of Technology, Cambridge, MA; 4) Broad Institute of MIT and Harvard, Cambridge, MA; 5) University of California, San Francisco, CA.
We present the Scalable Nucleotide Alignment Program (SNAP), a novel and efficient alignment algorithm and software package that enables new applications in sequence analysis. We present important examples in 1) outbreak detection, 2) sample quality control, and 3) genome remapping, and describe how SNAP has been designed to make them possible. SNAP provides accuracy equivalent to the current state-of-the-art aligners (substantiated by comparing variant calls) in 1/2 to 1/30 the time. SNAP is ready for upcoming developments in sequencing technology, with improved accuracy and increasing speed on longer paired end read lengths in contrast to some other popular algorithms. SNAP accepts multiple file formats (e.g., FASTQ, SAM, and BAM). It can sort, mark duplicates, and generate an indexed BAM file, and can align a typical paired-end human genome dataset in approximately four hours on a single commodity server, or at double that speed on longer read lengths.
You may contact the first author (during and after the meeting) at