Characterizing shared pathogenetics from genome-wide association studies via principal component analysis. A. Keinan1,2, D. Chang1,2 1) Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY; 2) Program in Computational Biology and Medicine, Cornell University, Ithaca, NY.

   Shared pathogenesis between diseases has been extensively studied through comorbidity. In the era of genome-wide association studies (GWAS), many studies have also reported associations that are shared between diseases. Such findings are especially common across autoimmune diseases, and have led to meta-analyses that combine distinct diseases to improve power of detecting shared associations. A few recent studies have also investigated this at the level of summary statistics; such as testing the correlation of a vector consisting of the strength and direction of association of many SNPs across several autoimmune GWAS. Here, we present a novel method for studying the relationship between different diseases from GWAS datasets. It extends on the above correlation method and overcomes its main shortcomings, namely that all SNPs contribute equally and that signals may be missed due to heterogeneity in population and genotyping arrays across datasets. Our method is based on principal component analysis (PCA), which has been extensively employed for assessing and correcting for population structure. It accounts for heterogeneity and for associated markers not necessarily being causal by first considering gene-level tests of association. The method proceeds by applying PCA to a matrix of significance scores across genes and across GWAS datasets. It also controls for possible confounders that differ across GWAS such as sample size and genotyping array, thereby focusing on the main difference between datasets -the disease under study. We applied our method to 30 GWAS of a range of diseases, including autoimmune, neurological, psychiatric, and cancer. The first few resulting principal components led us to four main observations: (1) Different GWAS of the same diseases lie close together. (2) Inflammatory bowel diseases (IBD) form a separate cluster, distinct from other autoimmune diseases. (3) Cancer and autoimmune diseases form a distinct cluster each, but neurological diseases exhibit some overlap with both. (4) Genes that play a dominant role in defining the principal components (i.e. underlie the observed structure amongst diseases) are significantly enriched for genes previously associated with IBD (P<10-9)as well as with other autoimmune and psychiatric diseasesand for genes in pathways involved in immune, such as antigen processing and presentation. These results stress the utility of our method for characterizing and quantifying shared pathogenetics.

You may contact the first author (during and after the meeting) at