Publication:
Uniform genomic data analysis in the NCI Genomic Data Commons.

dc.contributor.authorZhang, Zhenyu
dc.contributor.authorHernandez, Kyle
dc.contributor.authorSavage, Jeremiah
dc.contributor.authorLi, Shenglai
dc.contributor.authorMiller, Dan
dc.contributor.authorAgrawal, Stuti
dc.contributor.authorOrtuno, Francisco
dc.contributor.authorStaudt, Louis M
dc.contributor.authorHeath, Allison
dc.contributor.authorGrossman, Robert L
dc.date.accessioned2023-02-09T10:43:01Z
dc.date.available2023-02-09T10:43:01Z
dc.date.issued2021-02-22
dc.description.abstractThe goal of the National Cancer Institute's (NCI's) Genomic Data Commons (GDC) is to provide the cancer research community with a data repository of uniformly processed genomic and associated clinical data that enables data sharing and collaborative analysis in the support of precision medicine. The initial GDC dataset include genomic, epigenomic, proteomic, clinical and other data from the NCI TCGA and TARGET programs. Data production for the GDC started in June, 2015 using an OpenStack-based private cloud. By June of 2016, the GDC had analyzed more than 50,000 raw sequencing data inputs, as well as multiple other data types. Using the latest human genome reference build GRCh38, the GDC generated a variety of data types from aligned reads to somatic mutations, gene expression, miRNA expression, DNA methylation status, and copy number variation. In this paper, we describe the pipelines and workflows used to process and harmonize the data in the GDC. The generated data, as well as the original input files from TCGA and TARGET, are available for download and exploratory analysis at the GDC Data Portal and Legacy Archive ( https://gdc.cancer.gov/ ).
dc.identifier.doi10.1038/s41467-021-21254-9
dc.identifier.essn2041-1723
dc.identifier.pmcPMC7900240
dc.identifier.pmid33619257
dc.identifier.pubmedURLhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7900240/pdf
dc.identifier.unpaywallURLhttps://www.nature.com/articles/s41467-021-21254-9.pdf
dc.identifier.urihttp://hdl.handle.net/10668/17217
dc.issue.number1
dc.journal.titleNature communications
dc.journal.titleabbreviationNat Commun
dc.language.isoen
dc.organizationFundación Pública Andaluz Progreso y Salud-FPS
dc.page.number1226
dc.pubmedtypeJournal Article
dc.pubmedtypeResearch Support, N.I.H., Extramural
dc.rightsAttribution 4.0 International
dc.rights.accessRightsopen access
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subject.meshBase Sequence
dc.subject.meshDNA Copy Number Variations
dc.subject.meshDNA Methylation
dc.subject.meshData Analysis
dc.subject.meshDatabases, Genetic
dc.subject.meshGene Expression Regulation
dc.subject.meshGenome, Human
dc.subject.meshGenomics
dc.subject.meshHumans
dc.subject.meshMicroRNAs
dc.subject.meshMolecular Sequence Annotation
dc.subject.meshMutation
dc.subject.meshNational Cancer Institute (U.S.)
dc.subject.meshRNA-Seq
dc.subject.meshReproducibility of Results
dc.subject.meshUnited States
dc.subject.meshViruses
dc.titleUniform genomic data analysis in the NCI Genomic Data Commons.
dc.typeresearch article
dc.type.hasVersionVoR
dc.volume.number12
dspace.entity.typePublication

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
PMC7900240.pdf
Size:
2.13 MB
Format:
Adobe Portable Document Format