Concept: Top-down proteomics
There is interest in extending bottom-up proteomics to the smallest possible sample size. We investigated the performance of two modern mass spectrometers for the analysis of samples ranging from 1 ng to 1 µg of RAW 264.7 cell lysate digests.
High-throughput identification of proteins with the latest generation of hybrid high-resolution mass spectrometers is opening new perspectives in microbiology. I present, here, an overview of tandem mass spectrometry technology and bioinformatics for shotgun proteomics that make 2D-PAGE approaches obsolete. Non-labelling quantitative approaches have become more popular than labelling techniques on most proteomic platforms because they are easier to carry out while their quantitative outcome is rather robust. Parameters for recording mass spectrometry data, however, need to be chosen carefully and statistics to assess the confidence of the results should not be neglected. Interestingly, next-generation sequencing methodologies make any microbial model quickly amenable to proteomics, leading to the documentation of a wide range of organisms from diverse environments. Some recent discoveries made using microbial proteomics have challenged some biological dogma, such as: (i) initiation of the translation does not occur predominantly from ATG codons in some microorganisms, (ii) non-canonical initiation codons are used to regulate the production of specific but important proteins and (iii) a gene may code for multiple polypeptide species, heterogeneous in terms of sequences. Microbial diversity and microbial physiology can now be revisited by means of exhaustive comparative proteomic surveys where thousands of proteins are detected and quantified. Proteogenomics, consisting of better annotating of genomes with the help of proteomic evidence, is paving the way for integrated multi-omic approaches in microbiology. Finally, meta-proteomic tools and approaches are emerging for tackling the high complexity of the microbial world as a whole, opening new perspectives for assessing how microbial communities function.
Protein digestion is an integral part of the ‘shotgun’ proteomics approach and commonly requires overnight incubation prior to mass spectrometry analysis. Quadruplicate ‘shotgun’ proteomic analysis of whole yeast lysate demonstrated that Guanidine-Hydrochloride (Gnd-HCl) protein digestion can be optimally completed within 30 minutes with endoprotease Lys-C. No chemical artifacts were introduced when samples were incubated in Gnd-HCl at 95°C, making Gnd-HCl an appropriate digestion buffer for shotgun proteomics. Current methodologies for investigating protein-protein interactions (PPIs) often require several preparation steps, which prolongs any parallel operation and high-throughput interaction analysis. Gnd-HCl allow the efficient elution and subsequent fast digestion of PPIs to provide a convenient high-throughput methodology for affinity-purification mass spectrometry (AP-MS) experiments. To validate the Gnd-HCl approach, label-free PPI analysis of several GFP-tagged yeast deubiquitinating enzymes was performed. The identification of known interaction partners demonstrates the utility of the optimized Gnd-HCl protocol that is also scalable to the 96 well-plate format.
- Journal of bioinformatics and computational biology
- Published almost 6 years ago
This paper is a self-contained introductory tutorial on the problem in proteomics known as peptide sequencing using tandem mass spectrometry. This tutorial deals specifically with de novo sequencing methods (as opposed to database search methods). We first give an introduction to peptide sequencing, its importance and history and some background on proteins. Next we show the relationship between a peptide and the final spectrum produced from a tandem mass spectrometer, together with a description of the various sources of complications that arise during the process of generating the mass spectrum. From there we model the computational problem of de novo peptide sequencing, which is basically the reverse problem of identifying the peptide which produced the spectrum. We then present several major approaches to solve it (including reviewing some of the current algorithms in each approach), and also discuss related problems and post-processing approaches.
Characteristic mass differences between fragment ions from backbone cleavage of RNA by electron detachment (d, w) and fragment ions from collisionally activated dissociation (c, y) provide extensive sequence information. Structure analysis by this approach should be especially useful for the detailed characterization of synthetic or post-transcriptionally modified RNA.
Proteomics research routinely involves identifying peptides and proteins via tandem mass spectrometry sequence database search. Thus the database search engine is an integral tool in many proteomics research groups. Here, we introduce the Comet search engine to the existing landscape of commercial and open source database search tools. Comet is open source, freely available, and based on one of the original sequence database search tools that has been widely used for many years.
This Data Descriptor announces the submission to public repositories of the PNNL Biodiversity Library, a large collection of global proteomics data for 112 bacterial and archaeal organisms. The data comprises 35,162 tandem mass spectrometry (MS/MS) datasets from ~10 years of research. All data has been searched, annotated and organized in a consistent manner to promote reuse by the community. Protein identifications were cross-referenced with KEGG functional annotations which allows for pathway oriented investigation. We present the data as a freely available community resource. A variety of data re-use options are described for computational modelling, proteomics assay design and bioengineering. Instrument data and analysis files are available at ProteomeXchange via the MassIVE partner repository under the identifiers PXD001860 and MSV000079053.
In mass spectrometry-based bottom-up proteomics, data-independent acquisition (DIA) is an emerging technique due to its comprehensive and unbiased sampling of precursor ions. However, current DIA methods use wide precursor isolation windows, resulting in co-fragmentation and complex mixture spectra. Thus, conventional database searching tools that identify peptides by interpreting individual MS/MS spectra are inherently limited in analyzing DIA data. Here we discuss an alternative approach, peptide-centric analysis, which tests directly for the presence and absence of query peptides. We discuss how peptide-centric analysis resolves some limitations of traditional spectrum-centric analysis, and we outline the unique characteristics of peptide-centric analysis in general.
Mass spectrometry (MS) is the main technology used in proteomics approaches. However, on average 75% of spectra analysed in an MS experiment remain unidentified. We propose to use spectrum clustering at a large-scale to shed a light on these unidentified spectra. PRoteomics IDEntifications database (PRIDE) Archive is one of the largest MS proteomics public data repositories worldwide. By clustering all tandem MS spectra publicly available in PRIDE Archive, coming from hundreds of datasets, we were able to consistently characterize three distinct groups of spectra: 1) incorrectly identified spectra, 2) spectra correctly identified but below the set scoring threshold, and 3) truly unidentified spectra. Using a multitude of complementary analysis approaches, we were able to identify less than 20% of the consistently unidentified spectra. The complete spectrum clustering results are available through the new version of the PRIDE Cluster resource (http://www.ebi.ac.uk/pride/cluster). This resource is intended, among other aims, to encourage and simplify further investigation into these unidentified spectra.
Top-down proteomics, the analysis of intact proteins in their endogenous form, preserves valuable information about post-translation modifications, isoforms and proteolytic processing. The quality of top-down liquid chromatography-tandem MS (LC-MS/MS) data sets is rapidly increasing on account of advances in instrumentation and sample-processing protocols. However, top-down mass spectra are substantially more complex than conventional bottom-up data. New algorithms and software tools for confident proteoform identification and quantification are needed. Here we present Informed-Proteomics, an open-source software suite for top-down proteomics analysis that consists of an LC-MS feature-finding algorithm, a database search algorithm, and an interactive results viewer. We compare our tool with several other popular tools using human-in-mouse xenograft luminal and basal breast tumor samples that are known to have significant differences in protein abundance based on bottom-up analysis.