Prospective participant selection and ranking to maximize actionable PGx variants and discovery in the eMERGE Network. D. Crosslin1,2, A. Gordon1, P. Robertson2, D. Hanna2, D. Carrell3, A. Scrol3, I. Kullo4, M. de Andrade5, E. Baldwin3, J. Grafton3, K. Doheny6, P. Crane7, R. Li8, S. Stallings9, S. Verma10, J. Wallace10, M. Ritchie10, M. Dorschner2, E. Larson3, D. Nickerson2, G. Jarvik1,2, The electronic Medical Records and Genomics (eMERGE) Network 1) Genome Sciences / Medical Genetics, University of Washington, Seattle, WA; 2) Department of Genome Sciences, University of Washington, Seattle, WA; 3) Group Health Research Institute, Center for Health Studies, Seattle, WA; 4) Division of Cardiovascular Diseases, Mayo Clinic, Rochester, MN; 5) Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN; 6) Center for Inherited Disease Research, Johns Hopkins University, Baltimore, MD; 7) Division of General Internal Medicine, University of Washington, Seattle, WA; 8) Office of Population Genomics, National Human Genome Research Institute, Bethesda, MD; 9) Department of Biomedical Informatics, Vanderbilt University, Nashville, TN; 10) Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA.

   Some 9,000 participants in the eMERGE Network are being sequenced with the targeted Pharmacogenomics Research Network sequence platform (PGRNseq), thus linking electronic health records (EHR) to pharmacogenetic variant data to ultimately return actionable results. PGRNseq contains the coding regions, UTRs, and 2kb upstream for 84 pharmacogenes. To return CLIA results to participants at the Group Health Cooperative, we initially sequenced DNA from 900 participants (61% female) and selected 450 of these to re-consent, redraw, and ultimately validate variants. We designed an algorithm to harness data from ancestry, diagnosis codes, medication records, laboratory results, and variant-level bioinformatics to ensure selection of an informative sample for this project. The algorithm involved two steps. We enriched our sample for diversity by over-selecting for non-European ancestry participants, which included African (5%) and Asian (8%) ancestry. We enriched for participants with EHR evidence of actionable indications related to PGRNSeq genes, including malignant hyperthermia, long QT syndrome, hypertension, atrial fibrillation, congestive heart failure, and elevated creatine kinase values within six months of a statin medication. We annotated the 900 multi-sample VCF by a combination of SeattleSeq and SnpEff, with additional custom variables including evidence from ClinVar, OMIM, and HGMD with links to prior clinical associations. We focused our analyses on 28 actionable genes, largely driven by the Clinical Pharmacogenetics Implementation Consortium. We derived a ranking system based on the number of coding variants per participant (75.214.7), and the number of variants with high or moderate impact (11.53.9). Notably, we identified 11 stop-gained (1%) and 519 missense (20%) variants out of a total of 1,785 in these 28 genes. Finally, we prioritized variants to be returned to the EHR with prior clinical evidence of pathogenicity or annotated as stop-gain for the following genes: CACNA1S and RYR1 (malignant hyperthermia); SCN5A, KCNH2, and RYR2 (arrhythmia); and LDLR (high cholesterol). Our analytic pipeline, including participant-level variant indexing, custom annotation, and R and LaTeX scripts, will serve as a foundation for identification of potentially actionable variants and EHR integration. These data will inform pathogenicity of specific variants and practices for EHR integration of genomic data.

You may contact the first author (during and after the meeting) at