Inferring the functional effects of non-synonymous variants using experimental results from deep mutational scanning. R. J. Hause, V. E. Gray, J. Shendure, D. M. Fowler Genome Sciences, University of Washington, Seattle, WA.

   Investigating the consequences of non-synonymous genetic variation furthers our basic understanding of protein function while also facilitating the interpretation of variants observed in a clinical setting. Many computational tools exist to predict the effects of amino acid substitutions on protein function (e.g. SNAP, SIFT, PolyPhen-2); however, nearly all of these methods rely solely on evolutionary, biochemical, and structural information without leveraging experimental data. Where experimental data is used, it tends to be outdated, limited to a few mutations per protein. Deep mutational scanning (DMS) is a method that uses next-generation sequencing to experimentally measure the functional effects of hundreds of thousands of variants of a protein. Because DMS surveys the sequence-function landscape of proteins based on much larger numbers of mutations than what has been available to date, models trained on these datasets may improve the performance of models for predicting mutational consequences. We set out to utilize DMS data: (1) to better understand the relationship between properties of mutations and specific protein properties, (2) to predict the functional effects of mutations in proteins, and (3) ultimately, to improve the interpretability of variants of unknown significance in clinically relevant genes. To these ends, we are constructing an ensemble classifier based on evolutionary, physicochemical, and structural features to predict estimates of protein functionality derived from DMS of over 86 proteins. In ongoing work that will be presented at ASHG, we will analyze feature importance both globally and for specific functions (e.g. binding, stability). Using cross-validation and external validation on unpublished DMS datasets, we will demonstrate the extent to which our classifier is predictive of the functional effects of non-synonymous mutations and compare its performance to other available algorithms. We will also assess the ability of our algorithm to distinguish common, non-synonymous variants from the 1000 Genomes Project and the Exome Sequencing Project from rare, pathogenic non-synonymous variants in Clinvar and COSMIC. We anticipate that our model will improve prediction of functional and pathogenic variants, shed light on the underlying parameters that correlate with functionality and pathogenicity, and highlight the power of incorporating available experimental data from DMS into variant effect prediction.

You may contact the first author (during and after the meeting) at