A generalized sparse regression model with adjustment of pedigree structure for variant detection from next generation sequencing data. S. Cao1,2, H. Qin2,3, H. Deng2,3, Y. Wang1,2,3 1) Department of Biomedical Engineering, Tulane University, New Orleans, LA, USA; 2) Center for Bioinformatics and Genomics, Tulane University, New Orleans, LA, USA; 3) Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, LA, USA.
Complex diseases and traits are likely to be explained by both genotypes (e.g., common and rare genetic variants or SNPs) and environmental factors. Many association methods have been developed for detecting rare or common variants, and usually consider family design and unrelated individual design separately. To overcome the limitations of these methods, we develop a sparse regression model with the adjustment of pedigree structure and the incorporation of prior information. According to the pedigrees impact on continuous phenotypes, we propose a modified Kinship matrix to adjust the correlation between pedigrees. To incorporate prior knowledge, we regularize the model with weighted penalty terms. To get the sparse solution, we evaluate and implement a fast threshold algorithm for solving the regression model with L1/2 norm regularization. We also use the smooth gradient algorithm to solve the sparse model penalized with Lp (0p1) regularization term. After getting the solution path, we use the AIC (Akaike Information Criteria) and stability selection hybrid methods to determine the sparsity level. To evaluate our methods, we compare our method with the single marker test (2 test), Elastic-net, OMP (i.e., Orthogonal Matching Pursuit) and FOCUSS (i.e., FOcal Underdetermined System Solver). To validate the results, we use the Encyclopedia of DNA Elements (ENCODE) data to simulate the different pedigree structures and test our methods on the Genetic Analysis Workshop 17 and 18 data. The results on both simulation and real data analysis show that our proposed sparse regression models can discover more true causal variants while maintain a lower false discovery rate. In addition, the models tend to detect common and rare variants evenly; the detection of true causal rare variants is not overwhelmed by unrelated common variants. In conclusion, our proposed approach has the following advantages: (i) The model can adjust pedigree structures; (ii) The Lp (0p1) norm regularization model can yield higher true positive rate while lower false discovery rate than other methods; (iii) The weighted regularization term provides a flexible way to incorporate prior knowledge; (iv) Our model can be easily extended to accommodate environmental covariates.
You may contact the first author (during and after the meeting) at