Publication: BLASSO: integration of biological knowledge into a regularized linear model.
dc.contributor.author | Urda, Daniel | |
dc.contributor.author | Aragon, Francisco | |
dc.contributor.author | Bautista, Rocio | |
dc.contributor.author | Franco, Leonardo | |
dc.contributor.author | Veredas, Francisco J | |
dc.contributor.author | Claros, Manuel Gonzalo | |
dc.contributor.author | Jerez, Jose Manuel | |
dc.contributor.funder | MINECO-SPAIN | |
dc.contributor.funder | FEDER | |
dc.contributor.funder | ICE Andalucía TECH (Spain) | |
dc.date.accessioned | 2023-01-25T10:24:47Z | |
dc.date.available | 2023-01-25T10:24:47Z | |
dc.date.issued | 2018-11-20 | |
dc.description.abstract | In RNA-Seq gene expression analysis, a genetic signature or biomarker is defined as a subset of genes that is probably involved in a given complex human trait and usually provide predictive capabilities for that trait. The discovery of new genetic signatures is challenging, as it entails the analysis of complex-nature information encoded at gene level. Moreover, biomarkers selection becomes unstable, since high correlation among the thousands of genes included in each sample usually exists, thus obtaining very low overlapping rates between the genetic signatures proposed by different authors. In this sense, this paper proposes BLASSO, a simple and highly interpretable linear model with l1-regularization that incorporates prior biological knowledge to the prediction of breast cancer outcomes. Two different approaches to integrate biological knowledge in BLASSO, Gene-specific and Gene-disease, are proposed to test their predictive performance and biomarker stability on a public RNA-Seq gene expression dataset for breast cancer. The relevance of the genetic signature for the model is inspected by a functional analysis. BLASSO has been compared with a baseline LASSO model. Using 10-fold cross-validation with 100 repetitions for models' assessment, average AUC values of 0.7 and 0.69 were obtained for the Gene-specific and the Gene-disease approaches, respectively. These efficacy rates outperform the average AUC of 0.65 obtained with the LASSO. With respect to the stability of the genetic signatures found, BLASSO outperformed the baseline model in terms of the robustness index (RI). The Gene-specific approach gave RI of 0.15±0.03, compared to RI of 0.09±0.03 given by LASSO, thus being 66% times more robust. The functional analysis performed to the genetic signature obtained with the Gene-disease approach showed a significant presence of genes related with cancer, as well as one gene (IFNK) and one pseudogene (PCNAP1) which a priori had not been described to be related with cancer. BLASSO has been shown as a good choice both in terms of predictive efficacy and biomarker stability, when compared to other similar approaches. Further functional analyses of the genetic signatures obtained with BLASSO has not only revealed genes with important roles in cancer, but also genes that should play an unknown or collateral role in the studied disease. | |
dc.description.sponsorship | The authors acknowledge support through grants TIN2014-58516-C2-1-R and TIN2017-88728-C2 from MINECO-SPAIN which include FEDER funds. DU was supported by ICE Andalucía TECH (Spain) through a postdoctoral fellowship. | |
dc.description.version | Si | |
dc.identifier.citation | Urda D, Aragón F, Bautista R, Franco L, Veredas FJ, Claros MG, et al. BLASSO: integration of biological knowledge into a regularized linear model. BMC Syst Biol. 2018 Nov 20;12(Suppl 5):94 | |
dc.identifier.doi | 10.1186/s12918-018-0612-8 | |
dc.identifier.essn | 1752-0509 | |
dc.identifier.pmc | PMC6245593 | |
dc.identifier.pmid | 30458775 | |
dc.identifier.pubmedURL | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6245593/pdf | |
dc.identifier.unpaywallURL | https://doi.org/10.1186/s12918-018-0612-8 | |
dc.identifier.uri | http://hdl.handle.net/10668/13214 | |
dc.issue.number | Suppl 5 | |
dc.journal.title | BMC systems biology | |
dc.journal.titleabbreviation | BMC Syst Biol | |
dc.language.iso | en | |
dc.organization | Instituto de Investigación Biomédica de Málaga-IBIMA | |
dc.page.number | 14 | |
dc.provenance | Realizada la curación de contenido 22/08/2024 | |
dc.publisher | Springer Nature | |
dc.pubmedtype | Journal Article | |
dc.pubmedtype | Research Support, Non-U.S. Gov't | |
dc.pubmedtype | Validation Study | |
dc.relation.projectID | TIN2017-88728-C2 | |
dc.relation.projectID | TIN2014-58516-C2-1-R | |
dc.relation.publisherversion | https://bmcsystbiol.biomedcentral.com/articles/10.1186/s12918-018-0612-8 | |
dc.rights.accessRights | open access | |
dc.subject | Biological knowledge | |
dc.subject | Biomarkers selection | |
dc.subject | Machine learning | |
dc.subject | Precision medicine | |
dc.subject | RNA-Seq | |
dc.subject.decs | Análisis de secuencia de ARN | |
dc.subject.decs | Biomarcadores de tumor | |
dc.subject.decs | Medicina de precisión | |
dc.subject.decs | Neoplasias de la mama | |
dc.subject.decs | Perfilación de la expresión génica | |
dc.subject.mesh | Biomarkers, tumor | |
dc.subject.mesh | Breast neoplasms | |
dc.subject.mesh | Female | |
dc.subject.mesh | Gene expression profiling | |
dc.subject.mesh | Humans | |
dc.subject.mesh | Linear models | |
dc.subject.mesh | Machine learning | |
dc.subject.mesh | Precision medicine | |
dc.subject.mesh | Sequence analysis, RNA | |
dc.title | BLASSO: integration of biological knowledge into a regularized linear model. | |
dc.type | research article | |
dc.type.hasVersion | VoR | |
dc.volume.number | 12 | |
dspace.entity.type | Publication |
Files
Original bundle
1 - 1 of 1