A New Approach to finding Association with Complex, Longitudinal Phenotypes using Population Data. A. M. Musolf1, D. Londono1, A. Q. Nato, Jr.2, P. Vuistiner3, J. Brandon4, J. A. Herring5, C. A. Wise4,6, H. Zou7, M. Jin7,8, L. Yu1,9, S. J. Finch10, P. Bovet11, M. Bochud3, T. C. Matise1, D. Gordon1 1) Department of Genetics, Rutgers University, Piscataway, NJ; 2) Statistical Genetics Lab, Division of Medical Genetics, University of Washington, Seattle, WA 98195, USA; 3) Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland; 4) Seay Center for Musculoskeletal Research, Texas Scottish Rite Hospital for Children, Dallas, TX 75219, USA; 5) Department of Orthopedic Surgery, Texas Scottish Rite Hospital for Children, Dallas TX 75219, USA; 6) Department of Orthopedic Surgery, University of Texas Southwestern Medical Center, Dallas, TX 75219, USA; 7) Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China; 8) ShanghaiBio China, Pudong, Shanghai 201203, China; 9) Center of Alcohol Studies, Rutgers University, Piscataway, NJ 08854, USA; 10) Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY 11790, USA; 11) Ministry of Health, P.O. Box 52, Mont Fleuri, Republic of Seychelles.

   Previously, we detailed a new method for testing for association between longitudinal phenotypes and causal genotypes. The method uses growth mixture models to determine longitudinal trajectory curves. The Bayesian posterior probability (BPP) of belonging to a specific curve was then used as a quantitative phenotype in association analyses. To identify association for multiple SNPs, we did not perform association analyses on individual SNPs; instead the genome was sliced into blocks of 50 SNPs. A significance value was obtained on each block via the program TDT-HET. This method displayed greater than 80% empirical power in most simulations scenarios. This method exclusively used family-based data. Here, we extend the method to population-based data sets. The method maintains many ideas from the family-based method. However, the program SumStat is used to acquire significance levels on each block, instead of TDT-HET. Multiple scenarios are tested including four causal variants located within a single locus and eight causal variants spread between two loci on different chromosomes. Reduced models using environmental covariates were also considered. Our data set was highly stratified to ensure robustness in the presence of population stratification. To correct for population stratification, ancestry fractions from the program ADMIXTURE are regressed on the BPPs and the residuals are used as the phenotype for association analyses. Our method also utilized three distinct data sets, which represent a discovery data set and two confirmatory data sets. The final p-values of the association analyses were combined via Fishers method and corrected for multiple testing by the false discovery rate (FDR). We report that our simulations: 1) maintain the proper type I error in the presence of population stratification and 2) have greater than 90% power for all simulations. We conclude that our method can detect multiple causal SNPs located in multiple loci in population data sets. We believe that is method will be use to researchers who are studying complex diseases that display longitudinal phenotypes. It allows for high detection of causal loci (and the causal variants within) for both population and family studies, even in the presence of confounding elements such as population stratification and environmental variables.