RT Journal Article T1 A New Pipeline for the Normalization and Pooling of Metabolomics Data. A1 Viallon, Vivian A1 His, Mathilde A1 Rinaldi, Sabina A1 Breeur, Marie A1 Gicquiau, Audrey A1 Hemon, Bertrand A1 Overvad, Kim A1 Tjønneland, Anne A1 Rostgaard-Hansen, Agnetha Linn A1 Rothwell, Joseph A A1 Lecuyer, Lucie A1 Severi, Gianluca A1 Kaaks, Rudolf A1 Johnson, Theron A1 Schulze, Matthias B A1 Palli, Domenico A1 Agnoli, Claudia A1 Panico, Salvatore A1 Tumino, Rosario A1 Ricceri, Fulvio A1 Verschuren, W M Monique A1 Engelfriet, Peter A1 Onland-Moret, Charlotte A1 Vermeulen, Roel A1 Nøst, Therese Haugdahl A1 Urbarova, Ilona A1 Zamora-Ros, Raul A1 Rodriguez-Barranco, Miguel A1 Amiano, Pilar A1 Huerta, José Maria A1 Ardanaz, Eva A1 Melander, Olle A1 Ottoson, Filip A1 Vidman, Linda A1 Rentoft, Matilda A1 Schmidt, Julie A A1 Travis, Ruth C A1 Weiderpass, Elisabete A1 Johansson, Mattias A1 Dossus, Laure A1 Jenab, Mazda A1 Gunter, Marc J A1 Lorenzo Bermejo, Justo A1 Scherer, Dominique A1 Salek, Reza M A1 Keski-Rahkonen, Pekka A1 Ferrari, Pietro K1 cancer epidemiology K1 metabolites K1 metabolomics K1 normalization K1 pooling K1 technical variability AB Pooling metabolomics data across studies is often desirable to increase the statistical power of the analysis. However, this can raise methodological challenges as several preanalytical and analytical factors could introduce differences in measured concentrations and variability between datasets. Specifically, different studies may use variable sample types (e.g., serum versus plasma) collected, treated, and stored according to different protocols, and assayed in different laboratories using different instruments. To address these issues, a new pipeline was developed to normalize and pool metabolomics data through a set of sequential steps: (i) exclusions of the least informative observations and metabolites and removal of outliers; imputation of missing data; (ii) identification of the main sources of variability through principal component partial R-square (PC-PR2) analysis; (iii) application of linear mixed models to remove unwanted variability, including samples' originating study and batch, and preserve biological variations while accounting for potential differences in the residual variances across studies. This pipeline was applied to targeted metabolomics data acquired using Biocrates AbsoluteIDQ kits in eight case-control studies nested within the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort. Comprehensive examination of metabolomics measurements indicated that the pipeline improved the comparability of data across the studies. Our pipeline can be adapted to normalize other molecular data, including biomarkers as well as proteomics data, and could be used for pooling molecular datasets, for example in international consortia, to limit biases introduced by inter-study variability. This versatility of the pipeline makes our work of potential interest to molecular epidemiologists. SN 2218-1989 YR 2021 FD 2021-09-17 LK https://hdl.handle.net/10668/24707 UL https://hdl.handle.net/10668/24707 LA en DS RISalud RD Apr 7, 2025