Harnessing Web 2.0 Social Networks for Genetic Epidemiology Studies with Millions of People. Y. Erlich1, J. Kaplanis1, M. Gershovits1, P. Nagaraj1,2, D. MacArthur3,4, A. Price5 1) Whitehead Inst Biomedical Research, Cambridge, MA; 2) Massachusetts institute of technology; 3) Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA; 4) Broad Institute of Harvard and MIT, Cambridge, MA; 5) Harvard School of Public Health, Boston, MA.

   Understanding the genetic architecture of complex traits is one of the top missions of human genetics. Emerging lines of studies have highlighted the entangled etiologies of these traits, which can include epistasis, parent-of-origin effects, sex and age interactions, and environmental risk factors. To conduct robust genetic epidemiological analysis, statistical models require sampling substantial amount of data from large families. However, the recruitment of large cohorts of extended kinships is both logistically challenging and cost-prohibitive. Here, we present a Big Data strategy to address this challenge: harnessing existing, free, and massive Web 2.0 social network resources to trace the aggregation of complex traits in extremely large families. We collected millions of public profiles from Geni.com, the world's largest genealogy-driven social network. Using this information, we constructed a single pedigree of 13 million individuals spanning many generations up to the 15th century and validated its quality using unilineal Y chromosome and mitochondira markers. In addition, Natural Language Processing was used to convert genealogical information into birth and death locations to obtain a proxy for environmental factors. We obtained multiple of phenotypes from this resource including longevity, fertility, migration patterns, and facial morphologies phenotyped form digital photos in Geni.com. This dataset provides a wide range of kinships for familial aggregation studies. We will present the dataset, which we aim to release as a community resource and show heritability estimates across distant relatives to disentangle analysis of epistasis, parent-of-origin, and shared environments.

You may contact the first author (during and after the meeting) at