G-STRATEGY: Optimal Selection of Individuals to Genotype in Genetic Association Studies with Related Individuals. M. Wang1, J. Jakobsdottir2, A. V. Smith2,3, M. S. McPeek1,4 1) Department of Statistics, University of Chicago, Chicago, IL, USA; 2) Icelandic Heart Association, Holtasmári 1, IS-201 Kópavogur, Iceland; 3) University of Iceland, Reykjavík, Iceland; 4) Departments of Human Genetics, University of Chicago, Chicago, IL, USA.

   A common problem in genetic association studies is choosing a fixed-size subset of individuals to genotype. The problem arises naturally in studies in which the genotyping budget is limited. Suppose a cohort of phenotyped individuals is available, with some subset of them possibly already genotyped, and one wants to choose an additional fixed-size subset of individuals to genotype in such a way that the power to detect association is maximized. When the phenotyped sample includes related individuals, power can be gained by including partial information, such as phenotype data of ungenotyped relatives, in the association analysis. It is important to take this into account when assessing whom to genotype. We propose G-STRATEGY, a method for selection of individuals for genotyping, conditional on phenotypes and kinship. G-STRATEGY uses simulated annealing to maximize the noncentrality parameter of either the MQLS or MASTOR statistic, both of which increase power in this context by incorporating phenotype information on ungenotyped relatives. In simulations, G-STRATEGY performs well for a range of complex disease models and outperforms other strategies (selection of maximally unrelated individuals, extreme phenotype enrichment, and GIGI-pick, a previously proposed method) with, in many cases, relative power increases of 20-40% over the next best strategy, while maintaining correct type 1 error. Importantly, G-STRATEGY is computationally feasible even for large datasets. When we applied G-STRATEGY to data on high-density lipoprotein (HDL) from the AGES-Reykjavik and REFINE-Reykjavik studies, with over 8000 phenotyped and 3000 genotyped individuals, it took G-STRATEGY <5 minutes to choose 380 additional individuals for genotyping, from among those not already genotyped. To further evaluate performance, we masked the available genotypes during the selection process, and selected either 1000 or 2000 individuals for genotyping from among those with masked genotypes. For the resulting samples, we then ran targeted association analysis among known HDL genes. For association with SNPs in CETP, based on 1000 individuals chosen by G-STRATEGY, a p-value of 8x10-13 is obtained, while the smallest p-value for the maximally-unrelated strategy is 2x10-10. With 2,000 individuals, the corresponding p-values are 2x10-19 for G-STRATEGY and 9x10-13 for the maximally-unrelated strategy, demonstrating the power advantage G-STRATEGY can provide over other methods.

You may contact the first author (during and after the meeting) at