Uniform genomic data analysis in the NCI Genomic Data Commons.

Zhang, Zhenyu; Hernandez, Kyle; Savage, Jeremiah; Li, Shenglai; Miller, Dan; Agrawal, Stuti; Ortuno, Francisco; Staudt, Louis M; Heath, Allison; Grossman, Robert L

Publication:
Uniform genomic data analysis in the NCI Genomic Data Commons.

Files

PMC7900240.pdf (2.13 MB)

Identifiers

URI: http://hdl.handle.net/10668/17217

DOI: 10.1038/s41467-021-21254-9

Date

2021-02-22

Authors

Zhang, Zhenyu

Hernandez, Kyle

Savage, Jeremiah

Li, Shenglai

Miller, Dan

Agrawal, Stuti

Ortuno, Francisco

Staudt, Louis M

Heath, Allison

Grossman, Robert L

Metrics

Export

Abstract

The goal of the National Cancer Institute's (NCI's) Genomic Data Commons (GDC) is to provide the cancer research community with a data repository of uniformly processed genomic and associated clinical data that enables data sharing and collaborative analysis in the support of precision medicine. The initial GDC dataset include genomic, epigenomic, proteomic, clinical and other data from the NCI TCGA and TARGET programs. Data production for the GDC started in June, 2015 using an OpenStack-based private cloud. By June of 2016, the GDC had analyzed more than 50,000 raw sequencing data inputs, as well as multiple other data types. Using the latest human genome reference build GRCh38, the GDC generated a variety of data types from aligned reads to somatic mutations, gene expression, miRNA expression, DNA methylation status, and copy number variation. In this paper, we describe the pipelines and workflows used to process and harmonize the data in the GDC. The generated data, as well as the original input files from TCGA and TARGET, are available for download and exploratory analysis at the GDC Data Portal and Legacy Archive ( https://gdc.cancer.gov/ ).

MeSH Terms

Base Sequence
DNA Copy Number Variations
DNA Methylation
Data Analysis
Databases, Genetic
Gene Expression Regulation
Genome, Human
Genomics
Humans
MicroRNAs
Molecular Sequence Annotation
Mutation
National Cancer Institute (U.S.)
RNA-Seq
Reproducibility of Results
United States
Viruses

Collections

Fundación Pública Andaluza Progreso y Salud

Full item page

Publication:
Uniform genomic data analysis in the NCI Genomic Data Commons.

Files

Identifiers

Date

Authors

Advisors

Journal Title

Journal ISSN

Volume Title

Publisher

Metrics

Export

Research Projects

Organizational Units

Journal Issue

Abstract

Description

MeSH Terms

DeCS Terms

CIE Terms

Keywords

Citation

Collections

Publication: Uniform genomic data analysis in the NCI Genomic Data Commons.

Files

Identifiers

Date

Authors

Advisors

Journal Title

Journal ISSN

Volume Title

Publisher

Metrics

Export

Research Projects

Organizational Units

Journal Issue

Abstract

Description

MeSH Terms

DeCS Terms

CIE Terms

Keywords

Citation

Collections

Publication:
Uniform genomic data analysis in the NCI Genomic Data Commons.