Detecting complex fusion transcripts in pediatric cancer using a novel assembly-based algorithm CICERO. Y. LI1, T. Bo2, M. Rusch1, J. Easton3, K. Boggs3, B. Vadodaria3, P. Gupta1, G. Song2, J. Ma2, C. G. Mullighan2, S. J. Baker4, R. J. Gilberton4, J. R. Downing2, D. W. Ellison2, J. Zhang1 1) Department of Computational Biology, St Jude Children's Research Hospital, Memphis, TN; 2) 2Department of Pathology, St Jude Children's Research Hospital, Memphis, TN; 3) 3The Pediatric Cancer Genome Project Validation Laboratory, St Jude Children's Research Hospital, Memphis, TN; 4) 4Department of Developmental Neurobiology, St Jude Children's Research Hospital, Memphis, TN.

   Fusion genes are important for cancer diagnosis, subtype definition and targeted therapy. Although RNAseq is useful for detecting fusion transcripts, computational methods to identify fusion transcripts arising from internal tandem duplication (ITD), that have multiple partners, low expression or non-template insertion are limited. We developed an assembly-based algorithm CICERO (CICERO Is Clipped-reads Extended for RNA Optimization) that is able to extend the read-length spanning fusion junctions for detecting complex fusions. Using test data that include RNASeq from 3 ependymoma (EPD), 39 low-grade glioma (LGG), and 128 acute lymphoblastic leukemia (ALL), we have shown that CICERO is able to detect multi-segment fusion transcripts resulting from chromothripsis, internal tandem duplication or re-arrangement at a highly repetitive immunoglobulin (IG) locus; all of which would be missed by existing fusion analysis methods. The overall sensitivity and accuracy of CICERO are much higher compared with existing tools such as deFuse and Tophat-Fusion. Using CICERO, we analyzed >600 brain tumor and leukemia transcriptomes from the St. Jude/Washington University Pediatric Cancer Genome Project (PCGP) and detected recurrent C11orf95-RELA fusions in EPD, FGFR1 ITD in LGG, NTRK fusion in high-grade glioma and activating kinase fusions with multiple partners in ALL. CICERO also shows high sensitivity when detecting fusions with low expression, like BRAF fusions in LGG, making it useful for identifying subclonal lesions and for analyzing tumor specimens with low purity. Furthermore, the power of CICERO increases with the extended read-length enabled by improvement in next-generation sequencing (NGS) technology. Using paired-end 300bp RNAseq reads, CICERO shows the ability to assemble near full-length fusion transcripts and identify complex fusions with multiple segments.