Interpretation of Variants of Unknown Significance with a Large Database of Genotyped and Phenotyped Individuals. B. T. Naughton, A. Chowdry, J. M. Macpherson, G. M. Benton 23andMe, Inc., Mountain View, CA.

   The interpretation of variants of unknown significance (VUS) from whole-genome sequence data is a substantial challenge in genetics. VUS are usually too rare to be amenable to genome-wide association studies and so traditionally have been interpreted with reference to the primary literature (especially for high-penetrance or Mendelian mutations) or by computational methods (e.g., SIFT, PolyPhen). While these methods can provide useful insights, they are often limited by the presence of false positives in the literature or by imperfect prediction algorithms. 23andMe, a personal genomics company, has assembled a database of 150,000 genotyped individuals, over 90,000 of whom have consented to participate in research and answered at least one research question. Participants answer research questions on the 23andMe website on topics as diverse as their medical history, personality, lifestyle and exercise. Here we present data demonstrating that this database can be used to empirically determine the significance of variants found in human sequence data. As proof of concept we used the database to confirm that the BRCA1 mutations 185delAG and 5382insC and the BRCA2 mutation 6174delT are associated with greatly increased breast cancer risk. Conversely, we confirmed that the BRCA mutations R841W and S1040N are benign polymorphisms that are not associated with increased breast cancer risk. In a real-world example, we analyzed a VUS in MLH1 from a sequenced exome that was suspected to be cancer-causing. Using individuals self-reported cancer status from the database, we determined that the variant is unlikely to be cancer-causing. Our finding agrees with previously reported results in the literature. Due to the extensive phenotyping of our cohort (over 50 million phenotypic data points) and the large number of rare variants on our custom genotyping chip (over 30,000 putatively disease-associated rare variants), this method is applicable to a large number of genes and phenotypes. We further discuss extending this method to variants not present on the genotyping chip by inferring the presence of mutations in individuals in the database based on identity-by-descent (IBD).

You may contact the first author (during and after the meeting) at