Power studies for the extreme versus control design for exome sequencing studies and identification of a variant of TMC6 as a deleterious modifier for age of onset of chronic Pseudomonas infection in children with cystic fibrosis. M. J. Emond1, T. Louie1, J. Emerson2,7, R. A. Mathias3, M. R. Knowles4, D. A. Nickerson5, H. K. Tabor2,6, K. C. Barnes3, R. L. Gibson2,7, M. J. Bamshad2,5,8, NHLBI Exome Sequencing Project, Lung GO 1) Department of Biostatistics, University of Washington, Seattle, WA; 2) Department of Pediatrics, University of Washington, Seattle, WA; 3) Department of Medicine, School of Medicine, Johns Hopkins University, Baltimore, MD; 4) Cystic Fibrosis/Pulmonary Research and Treatment Center, University of North Carolina at Chapel Hill, Chapel Hill, NC; 5) Department of Genome Sciences, University of Washington, Seattle, WA; 6) Trueman-Katz Center for Pediatric Bioethics, Seattle Childrens Research Institute, Seattle, WA; 7) Division of Pulmonary Medicine, Seattle Childrens Hospital, Seattle, WA; 8) Division of Genetic Medicine, Seattle Childrens Hospital, Seattle, WA.

   As part of the NHLBI Exome Sequencing Project (ESP), we identified rare variants in DCTN4 that modify age-of-onset of chronic Pseudomonas (Pa) infection in cystic fibrosis (CF) using an extreme phenotypes design, by-gene tests, and a small number of individuals in each extreme (n< 50). However, this design and sample size will be under-powered for discovery of many variants underlying complex traits. We have devised a more powerful design for the same overall cost as an extreme phenotypes design by making use of a large set of extant exomes from individuals suitable as controls to which we compare exomes from one extreme of phenotype. The power of this extreme vs. population control design depends on the extent of enrichment of variants in the extreme. For example, consider a scenario in which extreme A is expected to harbor causal variants at 3 times the frequency of the overall population, the opposite extreme is expected to have zero variants (an ideal situation for an extreme phenotypes design) and the cumulative derived allele frequency (sum of causal allele frequencies) is 5% in the overall population. Using an extreme phenotypes design with 50 individuals per extreme (n=100 total) provides 26% power to detect the association, whereas use of 50 individuals in extreme A compared to 3000 controls provides 46% power, and 100 individuals in extreme A compared to 3000 controls provides 87% power. For a 4X enrichment in extreme A, we estimate the powers to be 40%, 91% and 99.9%, respectively. We show power gains of similar magnitude in other scenarios. Applying this design to exomes from extremes of individuals with early chronic Pa (n=86) in CF compared to exomes from 3316 ancestry-matched control individuals using by-gene tests (adj-SKAT-O), we identified TMC6 as having one or more variants significantly associated with age-of-onset of chronic Pa (p=9.6x10-7) and validated an association between age-of-onset and a TMC6 variant among 556 individuals from the Early Pseudomonas Infection Control Observational Study (p=0.0005, HR=5.2, 95% CI [1.3-2.8]). This TMC6 variant was also associated with an 8.0 percentile decrease in FEV1 (p=0.01). The design also has high power for by-variant analyses, and preliminary by-variant results based on 156 CF exomes results in 7 significant variants in 7 genes. This extreme phenotype vs. controls study design can be a low-cost, powerful strategy for discovery of novel variants associated with risk for complex traits.

You may contact the first author (during and after the meeting) at