Automated tumor phylogeny reconstruction using multi-sample deep sequencing somatic variants. V. Popic1, R. Salari1, D. Kashef-Haghighi1, D. Newburger2, R. West3, S. Batzoglou1 1) Department of Computer Science, Stanford University, Stanford, CA; 2) Biomedical Informatics Training Program, Stanford University, Stanford, CA; 3) Department of Pathology, Stanford University School of Medicine, Stanford, CA.

   Numerous studies have shown tumors to be highly heterogeneous, consisting of cell subpopulations with distinct somatic mutational profiles. Tumor heterogeneity is often studied by comparison of multiple tumor samples that are extracted from a single patient either at different points in time during cancer development or from different regions of the same tumor or its metastases. Most existing multi-sample studies infer phylogenetic cancer cell lineage trees either manually or with classical species phylogenetic approaches that do not model sample heterogeneity. Here we present SMutH, Somatic Mutation Hierarchies, a novel computational method that automates the phylogenetic inference of cancer progression from multiple somatic samples. Our method avoids the common assumption of clonal homogeneity of samples and is able to reconstruct the lineage relationships even when each sample is a heterogeneous mixture of cells. SMutH uses variant allele frequencies (VAFs) of somatic SNVs obtained by deep sequencing to reconstruct multi-sample cell lineage trees and infer the subclonal composition of the samples. SMutH clusters SNVs based on their VAFs and presence patterns across samples and incorporates the resulting clusters into an evolutionary constraint network, which encodes all possible precedence relationships among SNV clusters. In order to trace cell lineage trees and identify sample subclones, the constraint network is searched for phylogenetically valid spanning trees. We evaluated SMutH on two published datasets of clear cell renal cell carcinoma (ccRCC) (Gerlinger et al 2014) and high-grade serous ovarian cancer (HGSC) (Bashashati et al 2013), as well as on simulated data. We found that our method is highly effective in reconstructing the underlying cell lineage phylogenies in real data and simulations. The trees generated by SMutH were nearly identical topologically to the published ccRCC trees. For the HGSC dataset, SMutH produced trees with better support from the data (as confirmed by manual inspection). SMutH also revealed additional heterogeneity in the samples of both studies. In particular, SMutH identified subclones in one more sample of the ccRCC study (in addition to the reported six samples) and three samples of the HGSC study, all supported by the data, demonstrating the need for phylogeny inference methods specialized for heterogeneous cancer datasets.

You may contact the first author (during and after the meeting) at