Detection and editing of the updated Arabidopsis plastid- and mitochondrial-encoded proteomes through PeptideAtlas.

Document Type


Publication Date


Publication Title

Plant physiology


washington; isb; genomics


Arabidopsis (Arabidopsis thaliana) ecotype Col-0 has plastid and mitochondrial genomes encoding over 100 proteins. Public databases (e.g., Araport11) have redundancy and discrepancies in gene identifiers for these organelle-encoded proteins. RNA editing results in changes to specific amino acid residues or creation of start and stop codons for many of these proteins, but the impact of RNA editing at the protein level is largely unexplored due to the complexities of detection. Here, we assembled the non-redundant set of identifiers, their correct protein sequences, and 452 predicted non-synonymous editing sites of which 56 are edited at lower frequency. We then determined accumulation of edited and/or unedited proteoforms by searching ∼259 million raw tandem mass spectrometry spectra from ProteomeXchange, which is part of PeptideAtlas ( We identified all mitochondrial proteins and all except three plastid-encoded proteins (NdhG/Ndh6, PsbM, Rps16), but no proteins predicted from the four open reading frames were identified. We suggest that Rps16 and three of the open reading frames are pseudogenes. Detection frequencies for each edit site and type of edit (e.g., S to L/F) were determined at the protein level, cross-referenced against the metadata (e.g., tissue), and evaluated for technical detection challenges. We detected 167 predicted edit sites at the proteome level. Minor frequency sites were edited at low frequency at the protein level except for cytochrome C biogenesis 382 at residue 124 (Ccb382-124) Major frequency sites (>50% editing of RNA) only accumulated in edited form (>98-100% edited) at the protein level, with the exception of Rpl5-22. We conclude that RNA editing for major editing sites is required for stable protein accumulation.


Institute for Systems Biology