Non-targeted metabolite profiling in large human population-based studies: a new data analysis workflow and metabolome-wide association study of C-reactive protein. A. Ganna1, T. Fall2, W. Lee3, C. D. Broeckling4, J. Kumar2, S. Hägg1,2, P. K. E. Magnusson1, J. E. Prenni4, L. Lind5, Y. Pawitan1, E. Ingelsson2 1) Department of medical epidemiology and biostatistics, Karolinska institutet, Stockholm, Sweden; 2) Department of Medical Sciences, Molecular Epidemiology and Science for Life Laboratory, Uppsala University, Uppsala, Sweden; 3) Department of Statistics, Inha University, Incheon, Korea; 4) Proteomics and Metabolomics Facility, Colorado State University, Fort Collins, Colorado, U.S.A; 5) Department of Medical Sciences, Uppsala University, Uppsala, Sweden.

   Background: Recently, the potential of metabolomics in medical and pharmacological research has been illustrated through the investigation of genotype-metabolite or metabolite-phenotype associations in several population-based studies. However, the majority of these studies have been performed with small sample sizes and/or in a targeted manner involving a biased analysis of a pre-determined panel of metabolites. Objectives: Our aims were:(1) To illustrate a new data analysis workflow for detection and annotation of metabolites in large human population-based studies;(2) To illustrate a real application of the described workflow to serum samples from 2,380 fasting individuals and to conduct a non-targeted metabolome-wide association study of high-sensitive C-reactive protein (hsCRP) levels. Methods: Samples were analyzed using ultra-performance liquid chromatography coupled with mass spectrometry (UPLC-MS). The workflow is comprised of four modules:(1) Peaks from each chromatogram are detected, aligned and grouped across samples. Each peak group is called a feature.(2) Feature intensities are log-transformed and normalized. Outliers are excluded. Factors of unwanted variation are identified and removed.(3) Features associated with the outcome are identified through univariate statistical analysis. False discovery rate is controlled to select features for replication in an independent validation study.(4) MS and MS/MS spectra are generated using an indiscriminate data acquisition workflow coupled with correlational grouping and used to annotate significant features through spectral matching against both private and public spectral libraries. Confidence levels are assigned to define the quality of the metabolite annotation. Results: Using the described workflow, we identified 8,000 molecular features in serum samples from two population-based studies of 2,380 participants. We performed a non-targeted metabolome-wide association analysis of hsCRP and identified 439 features corresponding to 101 unique metabolites that could be replicated in an external population. Ten metabolites were annotated with high confidence. Our results revealed unexpected biological associations, such as metabolites annotated as monoacylphosphorylcholines (LysoPC) being negatively associated with hsCRP. Conclusions: The workflow and results presented illustrate the viability and potential of non-targeted metabolite profiling in large population-based studies.

You may contact the first author (during and after the meeting) at