Aneuploidy and normal cell contamination aware approach to detect copy number variations in cancer using next generation sequencing data. R. Gupta, S. Katragadda, D. Vyavahare, K. Sandhu, V. Veeramachaneni, R. Hariharan Strand Life Sciences, 5th Floor, Kirloskar Business Park, Bellary Road, Hebbal Bangalore - 560024, Karnataka, India.

   Background and Objectives: Recent growth in next generation sequencing (NGS) data has enabled us to detect copy number variation (CNV) at an unprecedented resolution. The objective of this study is two-fold: 1) Identify the CNV regions in the cancer genome and assign absolute copy number (CN); and 2) Compare CNV regions from different patients to identify regions that are commonly amplified or deleted, there by highlighting genes implicated in cancer. Challenges: Several technical and biological challenges inhibit the discovery of true segments and assignment of absolute CNs. In Particular, biological challenges include 1) Aneuploidy of cancer cells but many approaches assume diploid genome; 2) Contamination by normal and stromal cells compresses all signals towards CN state of 2; and 3) Heterogeneity in tumor cells i.e. there may be polyclonal tumors with in a tumor with each clone having different CNVs. Methods: Most of the approaches for detecting CNVs using NGS data are based on 1) read depth; 2) distance/orientation of read pairs; and 3) split reads. We used a method based on read depth and first compute the log-ratio of read depth in cancer and normal samples for fixed length windows, followed by Wavelet transformation of ratios to reduce the effect of random noise. An EM algorithm based probabilistic Gaussian mixture model is then built to model different CN states, and biological parameters of the sample, average ploidy and % normal cell contamination, are estimated. Finally, we used two segmentation approaches on the ploidy and contamination corrected log-ratio to obtain segments and corresponding CNs. First is na´ve and heuristic approach, which quickly identifies gain/loss regions without quantifying the degree of gain or loss; and 2) popular CBS approach, which can distinguish different gain (or loss) regions. This CNV detection approach is integrated in Avadis NGS, which is our software tool for the processing and comprehensive end-to-end analysis of NGS data. Experiments and Results: We demonstrated the efficacy of the CNV detection approach on both simulated data and publicly available real sequencing data. For simulation set, we simulated log-ratio data to cover different scenarios by varying sample ploidy, % of normal cell contamination, number of CN states, % of data noise, etc. We also used publicly available sequencing data of cancer cell lines and tumor samples from NCBI SRA and construct CN profiles for multiple cancer types.

You may contact the first author (during and after the meeting) at