The Warped Linear Mixed Model: finding optimal phenotype transformations yields a substantial increase in signal in genetic analyses. N. Fusi1, C. Lippert1, N. Lawrence2, O. Stegle3 1) eScience group, Microsoft Research, Los Angeles, USA; 2) Department of Computer Science, University of Sheffield, Sheffield, UK; 3) European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK.
Linear mixed models are a core statistical approach used in several key areas of genetics. In particular, they provide state-of-the-art solutions for genome-wide association studies, heritability estimation and phenotype prediction. However, one of the fundamental assumptions of these modelsthat the noise is Gaussian distributedrarely holds in practice. We show that as a result, standard approaches yield sub-optimal performance, resulting in significant losses in power for GWAS, increased bias in heritability estimation, and reduced accuracy for phenotype predictions. One way to mitigate this problem is to apply an appropriate transformation (e.g., log transform) as a preprocessing step of the phenotypic data. However, choosing the right transformation is challenging because of the need to manually define a set of transformations, and choose one over another, without a clear objective function that could be used to guide this decision. Thus, the problem has only been partially, and unsatisfactory solved. Here, we comprehensively address this important problem in genetics by introducing a robust and statistically principled method, the Warped Linear Mixed Model. Our approach automatically learns a suitable phenotype transformation from the observed data (both phenotypic and genotypic). This data-driven approach enables an infinite set of transformations to be automatically searched through, using the principles of statistical inference to determine which transformation is most suitable. In extensive synthetic and real experiments, we find up to twofold increases in GWAS power, reduced bias in heritability estimation of up to 30%, and significantly increased accuracy in phenotype prediction. Importantly, our warped linear mixed model is general and can be used in place of standard linear mixed models in a wide range of applications in genetics.