Low-pass whole-genome sequencing in Europeans identifies 1325 SNPs and indels associated with cis gene expression of which 4% are independent low frequency-large effect associations. A. R. Wood1, M. A. Tuke1, H. Yaghootkar1, D. Pasko1, H. Lin2, C. S. Xu2, D. G. Hernandez3,4, M. A. Nalls3, J. R. Gibbs3, L. Qibin2, S. Juan2, A. Murray1, D. Melzer5, M. N. Weedon1, A. B. Singleton3, L. Ferrucci6, T. M. Frayling1 1) Genetics of Complex Traits, University of Exeter Medical School, Exeter, United Kingdom; 2) Beijing Genomics Institute, Beishan Industrial Zone, Yantian District, Shenzhen, China; 3) Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, 35 Lincoln Drive, Bethesda, MD, USA; 4) Department of Molecular Neuroscience and Reta Lila Laboratories, Institute of Neurology, UCL, Queen Square House, Queen Square, London WC1N 3BG, United Kingdom; 5) Institute of Biomedical and Clinical Sciences, University of Exeter Medical School, Exeter, United Kingdom; 6) Clinical Research Branch, National Institute on Aging NIA-ASTRA Unit, Harbor Hospital, MD.
Initial results from whole genome sequencing, exome sequencing and exome microarray based experiments suggest that there are relatively few low frequency-large effect variants associated with common human phenotypes. However, these sequencing experiments have yet to reach sample sizes similar to those required to identify most common variant - phenotype associations. We aimed to test the role of low frequency variants in common human phenotypes using the same sample sizes where multiple common associations were detectable. As phenotypes, we used 11,132 cis gene expression profiles from the whole blood of 450 individuals from the population based InCHIANTI study. To identify low frequency variants we performed low-pass (mean 7X) whole-genome sequencing in 680 of the InCHIANTI individuals. We imputed missing genotypes using Beagle software, performed extensive QC of variants using the Genome Analysis Toolkit software and assessed the quality of our data by comparing it to 2Mb of deep sequence data (>100X) from 83 overlapping individuals. To identify variant - cis gene expression associations (cis-eQTLs) we inverse normalised all phenotypes and performed analysis using 9,720,795 SNPs and 2,018,182 indels observed at least 4 times. We used a P-value of P<1x10-06 that represented a false discovery rate of 5% based on the number of independent cis variants and phenotypes. Where we detected low frequency variant - phenotype associations we performed conditional analyses using the strongest common variant as a second variable. Using our deep sequence data, we estimated that we had detected 88% of SNPs and 79% of indels in our low pass sequencing data of which 0.6% and 16% respectively were false positives. We identified 1325 cis-eQTLs (1065 SNPs and 260 indels), of which 87 were low frequency (minor allele <5%) and had an average effect size of 1.36 standard deviations (SDs) (range: 0.80-2.32) compared to the 1238 common variants where the average effect size was 0.61 SDs (range: 0.32-1.81). Conditional analysis showed that common variants partially accounted for 37 low frequency signals but that 50 (4%) were independent of the strongest common variant signal at the cis locus. Our study shows that, using the same sample sizes, whole genome sequencing has the ability to identify low frequency variants with larger effect sizes than those observed for common variants, but that these low-frequency large effect signals may represent less than 5% of associations.
You may contact the first author (during and after the meeting) at