Choosing an RNA-seq Aligner for QTL and ASE Analysis in the Genotype-Tissue Expression Project. D. S. DeLuca1, T. Lappalainen2, P. Kheradpour1, M. Sammeth3, J. Monlong3, P. Ribeca4, E. Palumbo4, A. Battle5, E. Gelfand1, R. Guigo3, K. Ardlie1, G. Getz1, The GTEx Consortium 1) The GTEx Project, The Broad Institute, Cambridge, MA; 2) University of Geneva, Department of Genetic Medicine and Development, Genève, Switzerland; 3) Universitat Pompeu Fabra, Center for Genomic Regulation, Barcelona, Spain; 4) Centro Nacional de Análisis Genómico, CNAG, Barcelona, Spain; 5) Stanford University, Daphne Koller Group, Stanford, CA.

   For many applications, the first step in analyzing RNA sequence data is to align reads to a reference genome. The alignment step is fundamentally important because it directly impacts all downstream characterizations such as expression, splicing events and allele specific expression. RNA-seq has reached a critical stage where widespread adoption has resulted in the availability of a selection of alignment tools, but has not yet matured to the point of convergence on an established gold standard. As a result, investigators have a range of alignment options but little indication as to how the choice will affect their analyses. For the Genotyping-Tissue Expression project (GTEx), we have developed a series of criteria with which to compare alternative aligners that will reflect the performance in the projects targeted goals. A primary goal of GTEx is to create a public atlas for human gene expression and its regulation, to enable discovery of expression quantitative trait loci (eQTL) and establish associations with disease. In the projects pilot phase, GTEx typed 190 postmortem human donors from which 1814 total tissues (from 47 separate tissue sites) were profiled by RNA-seq to a median depth of 80 million aligned reads. We wanted to use the aligner that would allow for maximum discovery of eQTLs, splice-QTLs and allele specific expression. We chose to compare the relatively established Tophat aligner with the more recently developed GEM package. Initial metrics indicated that GEM produces a greater number of alignments, begging the question as to whether these additional alignments are providing biological signal or noise. We demonstrate that these additional reads do correlate strongly with expected expression levels, exhibit distributions consistent with biological expectations, and provide additional power for allele specific expression. Given the full panel of criteria, we conclude that while both aligners perform well overall, GEM does exhibit some advantages. Because the strategy of alignment comparison established here is broadly applicable, we expect that this analysis will provide a path forward to improved decision making when choosing an aligner for eQTL and related analysis.

You may contact the first author (during and after the meeting) at