Characterization of sequence variants from the first direct-to-consumer exome pilot project. A. Shmygelska, E. Harrington, C. McLean, A. Chowdry, B. Naughton 23andMe Inc, Mountain View, CA.
We present the analysis of exomes of forty-five individuals. This data set represents a subset of 23andMe's exome pilot project data sequenced at high coverage (80x) using the Agilent SureSelect 50Mb targeted exome capture platform. We summarize the number and pattern of genetic variants emerging from the study, which is to our knowledge the first direct-to-consumer exome project. We focused particularly on loss of function (LoF) variants, i.e., variants predicted to disrupt the function of protein-coding transcripts. In order to assess functional potential of the variants, we developed a filtering and annotation pipeline that ensures that variants called meet quality control measures (including coverage depth, genotype quality, and sequence uniqueness). An average exome has ~30,000 called variants in the exonic regions, ~700 of which are LoFs. This category includes frame shifts, stop gains and losses, start losses, and splice site modifications. Non-synonymous substitutions represent ~11,000, and synonymous substitutions account for ~12,000 variants. We identified and characterized a set of high-quality LoF variants (~400 per exome) including their occurrence in a set of genes involved in Mendelian disorders, disruption of protein domains, frequencies, evolutionary conservation, and functional classes of LoF-containing genes. We found that rare mutations (< 5% allelic frequency) are located more often than expected in domains present in membrane-associated proteins. As did previous studies, we found that in a healthy individual the number of variants predicted to substantially impact proteins is on the order of hundreds and most are carried in the heterozygous state. We also found that LoF-containing genes are enriched to encode drug-metabolizing enzymes, particularly cytochrome P450 (CYP) genes (18 out of 57 genes coding for various CYPs contain LoFs, adjusted p=4.4e-5), and transport proteins (ATP-powered pumps, transporters, and ion channels; adjusted p=8.8e-4). Additionally, we replicated previous findings of LoF enrichment in olfactory receptor and immune response genes. CYPs are a major enzyme class involved in the oxidative metabolism of a diverse set of molecules, including drugs, dietary chemicals and endogenous compounds. Together with ATP-binding cassette pumps, solute carrier transporters, and ion channels, CYPs play a key role in adverse drug interactions. We discuss further analysis of the LoFs found in drug response genes.
You may contact the first author (during and after the meeting) at