Selecting likely causal genes, pathways and relevant tissues from genome-wide association studies of complex traits by data-driven expression-prioritized integration. TH. Pers1,2,3, J. Karjalainen4, JN. Hirschhorn1,2,5, L. Franke4, the Genetic Investigation of ANthropometric Traits (GIANT) Consortium 1) Division of Endocrinology, Children's Hospital Boston, Boston, MA; 2) Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, USA; 3) Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark; 4) University of Groningen, University Medical Center Groningen, Department of Genetics, Groningen, The Netherlands; 5) Harvard Medical School, Boston, USA.
Genome-wide association studies (GWAS) continue to identify thousands of loci where common variants are associated with complex traits. Many of these loci have no single obviously causal gene; therefore the main challenge for gaining novel biological insight is to identify which gene at each locus most likely explains the association. Because functional follow-up studies are often intensive, a key first step is to use computational approaches to prioritize genes with respect to their biological relevance. Previous computational approaches have shown some success but often focus on single types of data, limiting their discriminatory power. We have developed an approach called DEPICT (Data-driven Expression-Prioritized Integration for Complex Traits) that integrates complementary data types (including 77,840 expression microarrays, 169,810 protein-protein interactions, 211,882 gene-phenotype pairs from mouse knock-out studies, and 6,004 gene sets from pathway databases) to systematically identify the most likely (1) causal gene at a given locus, (2) pathways that are enriched in genetic associations, and (3) tissues in which genes from associated loci are highly expressed. We applied DEPICT to multiple GWAS data sets, including data from the GIANT consortium for height, body-mass index (BMI) and waist-hip ratio adjusted for BMI. The method identifies enrichment of associated genes expressed in different relevant tissues corresponding to the different traits/diseases (e.g., cartilage for height, central nervous system tissues for BMI, adipose tissues for waist-hip ratio; and lymphoid tissue for inflammatory bowel disease, IBD). For the anthropometric traits, DEPICT also identifies more statistically significantly enriched pathways than MAGENTA, another gene set enrichment tool, and many of them overlap with relevant biology. We benchmarked DEPICT further using height and IBD results along with receiver operating statistics area under the curve statistics and show that the method outperforms DAPPLE and GRAIL, two commonly used GWAS data gene prioritization methods. As unbiased benchmarks, we tested for enrichment of genes that were differentially expressed in murine growth plates (DEPICT=0.79, GRAIL=0.67, DAPPLE=0.62) and genes that were transcriptionally regulated by IBD-associated markers in blood based on expression quantitatively trait locus data meta-analysis of 5,311 individuals (DEPICT=0.74, GRAIL=0.66, DAPPLE=0.64).
You may contact the first author (during and after the meeting) at