A New Pipeline for the Normalization and Pooling of Metabolomics Data.

dc.contributor.authorViallon, Vivian
dc.contributor.authorHis, Mathilde
dc.contributor.authorRinaldi, Sabina
dc.contributor.authorBreeur, Marie
dc.contributor.authorGicquiau, Audrey
dc.contributor.authorHemon, Bertrand
dc.contributor.authorOvervad, Kim
dc.contributor.authorTjønneland, Anne
dc.contributor.authorRostgaard-Hansen, Agnetha Linn
dc.contributor.authorRothwell, Joseph A
dc.contributor.authorLecuyer, Lucie
dc.contributor.authorSeveri, Gianluca
dc.contributor.authorKaaks, Rudolf
dc.contributor.authorJohnson, Theron
dc.contributor.authorSchulze, Matthias B
dc.contributor.authorPalli, Domenico
dc.contributor.authorAgnoli, Claudia
dc.contributor.authorPanico, Salvatore
dc.contributor.authorTumino, Rosario
dc.contributor.authorRicceri, Fulvio
dc.contributor.authorVerschuren, W M Monique
dc.contributor.authorEngelfriet, Peter
dc.contributor.authorOnland-Moret, Charlotte
dc.contributor.authorVermeulen, Roel
dc.contributor.authorNøst, Therese Haugdahl
dc.contributor.authorUrbarova, Ilona
dc.contributor.authorZamora-Ros, Raul
dc.contributor.authorRodriguez-Barranco, Miguel
dc.contributor.authorAmiano, Pilar
dc.contributor.authorHuerta, José Maria
dc.contributor.authorArdanaz, Eva
dc.contributor.authorMelander, Olle
dc.contributor.authorOttoson, Filip
dc.contributor.authorVidman, Linda
dc.contributor.authorRentoft, Matilda
dc.contributor.authorSchmidt, Julie A
dc.contributor.authorTravis, Ruth C
dc.contributor.authorWeiderpass, Elisabete
dc.contributor.authorJohansson, Mattias
dc.contributor.authorDossus, Laure
dc.contributor.authorJenab, Mazda
dc.contributor.authorGunter, Marc J
dc.contributor.authorLorenzo Bermejo, Justo
dc.contributor.authorScherer, Dominique
dc.contributor.authorSalek, Reza M
dc.contributor.authorKeski-Rahkonen, Pekka
dc.contributor.authorFerrari, Pietro
dc.date.accessioned2025-01-07T12:33:12Z
dc.date.available2025-01-07T12:33:12Z
dc.date.issued2021-09-17
dc.description.abstractPooling metabolomics data across studies is often desirable to increase the statistical power of the analysis. However, this can raise methodological challenges as several preanalytical and analytical factors could introduce differences in measured concentrations and variability between datasets. Specifically, different studies may use variable sample types (e.g., serum versus plasma) collected, treated, and stored according to different protocols, and assayed in different laboratories using different instruments. To address these issues, a new pipeline was developed to normalize and pool metabolomics data through a set of sequential steps: (i) exclusions of the least informative observations and metabolites and removal of outliers; imputation of missing data; (ii) identification of the main sources of variability through principal component partial R-square (PC-PR2) analysis; (iii) application of linear mixed models to remove unwanted variability, including samples' originating study and batch, and preserve biological variations while accounting for potential differences in the residual variances across studies. This pipeline was applied to targeted metabolomics data acquired using Biocrates AbsoluteIDQ kits in eight case-control studies nested within the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort. Comprehensive examination of metabolomics measurements indicated that the pipeline improved the comparability of data across the studies. Our pipeline can be adapted to normalize other molecular data, including biomarkers as well as proteomics data, and could be used for pooling molecular datasets, for example in international consortia, to limit biases introduced by inter-study variability. This versatility of the pipeline makes our work of potential interest to molecular epidemiologists.
dc.identifier.doi10.3390/metabo11090631
dc.identifier.issn2218-1989
dc.identifier.pmcPMC8467830
dc.identifier.pmid34564446
dc.identifier.pubmedURLhttps://pmc.ncbi.nlm.nih.gov/articles/PMC8467830/pdf
dc.identifier.unpaywallURLhttps://www.mdpi.com/2218-1989/11/9/631/pdf?version=1631874206
dc.identifier.urihttps://hdl.handle.net/10668/24707
dc.issue.number9
dc.journal.titleMetabolites
dc.journal.titleabbreviationMetabolites
dc.language.isoen
dc.organizationEscuela Andaluza de Salud Pública
dc.organizationEscuela Andaluza de Salud Pública
dc.organizationInstituto de Investigación Biosanitaria de Granada (ibs.GRANADA)
dc.pubmedtypeJournal Article
dc.rightsAttribution 4.0 International
dc.rights.accessRightsopen access
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectcancer epidemiology
dc.subjectmetabolites
dc.subjectmetabolomics
dc.subjectnormalization
dc.subjectpooling
dc.subjecttechnical variability
dc.titleA New Pipeline for the Normalization and Pooling of Metabolomics Data.
dc.typeresearch article
dc.type.hasVersionVoR
dc.volume.number11

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
PMC8467830.pdf
Size:
3.06 MB
Format:
Adobe Portable Document Format