Replication of gene-gene interaction models associated with cataracts in the eMERGE Network. M. A. Hall1, S. S. Verma1, E. R. Holzinger1, R. Berg2, J. Connolly3, D. C. Crawford4, D. R. Crosslin5, M. de Andrade6, K. F. Doheny7, J. L. Haines4, J. B. Harley8, G. P. Jarvik5, T. Kitchner2, H. Kuivaniemi9, E. B. Larson5,10, G. Tromp9, S. A. Pendergrass1, C. A. McCarty11, M. D. Ritchie1 1) Center for Systems Genomics, The Pennsylvania State University, University Park, PA; 2) Marshfield Clinic, Marshfield, WI; 3) Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA; 4) Center for Human Genetics Research, Vanderbilt University, Nashville, TN; 5) Department of Genome Sciences, University of Washington, Seattle, WA; 6) Mayo Clinic, Rochester, MN; 7) Center for Inherited Disease Research, IGM, Johns Hopkins University SOM, Baltimore, MD; 8) Cincinnati Childrens Hospital, University of Cincinnati, Department of Pediatrics, Cincinnati, OH; 9) Geisinger Health System, Danville, PA; 10) Group Health Research Institute, Seattle, WA; 11) Essentia Rural Health, Duluth, MN.

   Bioinformatics approaches to examine epistasis provide the means to discover the interactions between multiple genes and pathways that are likely the basis of complex disease. Despite its importance, extensive computational demands and adjusting for multiple testing make uncovering these interactions a challenge when explored with an exhaustive combinatorial search. Here, we address this issue using Biofilter 2.0 to identify putative SNP-SNP models for cataract susceptibility, reducing the number of models for analysis. With Biofilter 2.0, we created biologically relevant SNP-SNP models from genes with published associations, including genes belonging to the same pathway or having known biological interactions. Using PLATO software, we evaluated these models using logistic regression, adjusting for sex and principal components in 3,907 samples (1,354 controls, 2,553 cases) of European (3872), African (1), Asian (14), and other (13) descent from the Marshfield Clinic Personalized Medicine Research Project, part of the Electronic Medical Records & Genomics (eMERGE) Network. All highly significant models from the Marshfield Clinic (likelihood ratio test (LRT) p < 0.0001) were then tested in a replication dataset of 3,483 individuals (537 controls, 2,946 cases) of European (3251), African (113), Asian (66), and other (53) descent, using independent samples from additional sites in the eMERGE Network: Mayo Clinic, Group Health Cooperative, and Vanderbilt University Medical Center. Over 100 SNP-SNP models were found in the replicating sample at LRT p < 0.01, and 8 models replicated with high significance (LRT p < 10-4). The most significant replicating SNP-SNP models and their nearest genes included rs7749147 (FYN) - rs11017910 (DOCK1), rs9790292 (TGFBR2) - rs8110090 (TGFB1), rs10176426 (UGT1A10) - rs17863787 (UGT1A6), and rs11723463 (UGT2B4) - rs1112310 (UGT1A10). Notably, the genes UGT1A10 and UGT1A6, members of the UDP glucuronosyltransferase 1 family, and UGT2B4, of the UDP glucuronosyltransferase 2 family are involved in the porphyrin and chlorophyll metabolism pathway. This pathway has demonstrated association with cataracts, and therefore, bears further inquiry. These findings indicate the role of epistasis in susceptibility to cataracts and demonstrate the utility of Biofilter 2.0 as a biology-driven method, which can be applied to any GWAS dataset for investigation of the complex genetic architecture of common diseases.

You may contact the first author (during and after the meeting) at