Considerations for the Processing and Direct-to-Consumer Return of Exome Sequences. E. D. Harrington, C. McLean, A. Shmygelska, A. Chowdry, B. Naughton 23andMe Inc, Mountain View, CA.
In late 2011 23andMe announced our first publicly available sequencing product: the Exome Pilot Project. To return these data directly to consumers (DTC) we implemented a processing pipeline that would maximize the value to the consumer while maintaining data quality and security at each step. Enrollment in the pilot was limited to individuals who had been genotyped on the 23andMe platform. Samples were enriched using the Agilent SureSelect 50Mb platform and sequenced to at least 80X unaligned coverage using Illuminas HiSeq technology. Participants received their aligned raw data, variant calls, and a summary report describing relevant statistics and potential variants of interest based on a custom filtering process. We implemented a flexible and scalable pipeline for processing sequence data (exome and whole genome) using the Broad Institutes best practices. It employs a combination of standard tools (eg GATK) and custom software to automate the tracking of samples, data distribution to and collection from compute nodes, read mapping, variant calling, report generation, and validation against our existing genotype database. To quickly meet fluctuations in demand the pipeline can be deployed either locally or on a cloud platform. All variant calls were validated against existing genotype data when coverage on the 23andMe genotyping platform overlapped with the exome targets. For variant calls that passed our filters and had unambiguous stranding, we observed 99.6-99.9% concordance between chip and sequence data, consistent with the error rate of the chip. Lowering our stringency of filtering had a marked, though expected, effect on our concordance. Data security is integral to DTC data delivery. Data was encrypted with keys delivered via secure messaging on the 23andMe website. The encrypted raw data for each exome was on average 6GB, making bandwidth and data integrity another concern. We delivered data via Amazon S3 and the use of encryption made errors in transfer immediately obvious. Participant response to their exome data was varied but largely positive. Some participants had significant scientific background in genetics and interacted with others via the 23andMe community to provide guidance on their data. Some used their data to bootstrap research into rare diseases affecting family members, while others shared their results publicly via the Personal Genome Project. To date, no negative consequences of such data return have been observed.
You may contact the first author (during and after the meeting) at