A proteomics sample metadata representation for multiomics integration and big data analysis.

Publication Title

Nat Commun

Document Type

Article

Publication Date

10-6-2021

Keywords

washington; seattle; isb; Genomics; Big Data; Data Analysis; Databases, Protein; Humans; Metadata; Proteomics; Reproducibility of Results; Software; Transcriptome

Abstract

The amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and libraries to validate and submit sample metadata-related information to the PRIDE repository. We expect that these developments will improve the reproducibility and facilitate the reanalysis and integration of public proteomics datasets.

Specialty/Research Institute

Institute for Systems Biology

Share

COinS