Concept: Top-down proteomics
Proteomics research routinely involves identifying peptides and proteins via tandem mass spectrometry sequence database search. Thus the database search engine is an integral tool in many proteomics research groups. Here, we introduce the Comet search engine to the existing landscape of commercial and open source database search tools. Comet is open source, freely available, and based on one of the original sequence database search tools that has been widely used for many years.
There is interest in extending bottom-up proteomics to the smallest possible sample size. We investigated the performance of two modern mass spectrometers for the analysis of samples ranging from 1 ng to 1 µg of RAW 264.7 cell lysate digests.
High-throughput identification of proteins with the latest generation of hybrid high-resolution mass spectrometers is opening new perspectives in microbiology. I present, here, an overview of tandem mass spectrometry technology and bioinformatics for shotgun proteomics that make 2D-PAGE approaches obsolete. Non-labelling quantitative approaches have become more popular than labelling techniques on most proteomic platforms because they are easier to carry out while their quantitative outcome is rather robust. Parameters for recording mass spectrometry data, however, need to be chosen carefully and statistics to assess the confidence of the results should not be neglected. Interestingly, next-generation sequencing methodologies make any microbial model quickly amenable to proteomics, leading to the documentation of a wide range of organisms from diverse environments. Some recent discoveries made using microbial proteomics have challenged some biological dogma, such as: (i) initiation of the translation does not occur predominantly from ATG codons in some microorganisms, (ii) non-canonical initiation codons are used to regulate the production of specific but important proteins and (iii) a gene may code for multiple polypeptide species, heterogeneous in terms of sequences. Microbial diversity and microbial physiology can now be revisited by means of exhaustive comparative proteomic surveys where thousands of proteins are detected and quantified. Proteogenomics, consisting of better annotating of genomes with the help of proteomic evidence, is paving the way for integrated multi-omic approaches in microbiology. Finally, meta-proteomic tools and approaches are emerging for tackling the high complexity of the microbial world as a whole, opening new perspectives for assessing how microbial communities function.
Protein digestion is an integral part of the ‘shotgun’ proteomics approach and commonly requires overnight incubation prior to mass spectrometry analysis. Quadruplicate ‘shotgun’ proteomic analysis of whole yeast lysate demonstrated that Guanidine-Hydrochloride (Gnd-HCl) protein digestion can be optimally completed within 30 minutes with endoprotease Lys-C. No chemical artifacts were introduced when samples were incubated in Gnd-HCl at 95°C, making Gnd-HCl an appropriate digestion buffer for shotgun proteomics. Current methodologies for investigating protein-protein interactions (PPIs) often require several preparation steps, which prolongs any parallel operation and high-throughput interaction analysis. Gnd-HCl allow the efficient elution and subsequent fast digestion of PPIs to provide a convenient high-throughput methodology for affinity-purification mass spectrometry (AP-MS) experiments. To validate the Gnd-HCl approach, label-free PPI analysis of several GFP-tagged yeast deubiquitinating enzymes was performed. The identification of known interaction partners demonstrates the utility of the optimized Gnd-HCl protocol that is also scalable to the 96 well-plate format.
- Journal of bioinformatics and computational biology
- Published about 7 years ago
This paper is a self-contained introductory tutorial on the problem in proteomics known as peptide sequencing using tandem mass spectrometry. This tutorial deals specifically with de novo sequencing methods (as opposed to database search methods). We first give an introduction to peptide sequencing, its importance and history and some background on proteins. Next we show the relationship between a peptide and the final spectrum produced from a tandem mass spectrometer, together with a description of the various sources of complications that arise during the process of generating the mass spectrum. From there we model the computational problem of de novo peptide sequencing, which is basically the reverse problem of identifying the peptide which produced the spectrum. We then present several major approaches to solve it (including reviewing some of the current algorithms in each approach), and also discuss related problems and post-processing approaches.
Characteristic mass differences between fragment ions from backbone cleavage of RNA by electron detachment (d, w) and fragment ions from collisionally activated dissociation (c, y) provide extensive sequence information. Structure analysis by this approach should be especially useful for the detailed characterization of synthetic or post-transcriptionally modified RNA.
In liquid chromatography-mass spectrometry (LC-MS)-based proteomics, many precursors elute from the column simultaneously. In data-dependent analyses, these precursors are fragmented one at a time, whereas the others are discarded entirely. Here we employ trapped ion mobility spectrometry (TIMS) on an orthogonal quadrupole time-of-flight (QTOF) mass spectrometer to remove this limitation. In TIMS, all precursor ions are accumulated in parallel and released sequentially as a function of their ion mobility. Instead of selecting a single precursor mass with the quadrupole mass filter, we here implement synchronized scans in which the quadrupole is mass positioned with sub-ms switching times at the m/z values of appropriate precursors, such as those derived from a topN precursor list. We demonstrate serial selection and fragmentation of multiple precursors in single 50 ms TIMS scans. Parallel accumulation - serial fragmentation (PASEF) enables hundreds of MS/MS events per second at full sensitivity. Modelling the effect of such synchronized scans for shotgun proteomics we estimate that about a ten-fold gain in sequencing speed should be achievable by PASEF without a decrease in sensitivity.
This Data Descriptor announces the submission to public repositories of the PNNL Biodiversity Library, a large collection of global proteomics data for 112 bacterial and archaeal organisms. The data comprises 35,162 tandem mass spectrometry (MS/MS) datasets from ~10 years of research. All data has been searched, annotated and organized in a consistent manner to promote reuse by the community. Protein identifications were cross-referenced with KEGG functional annotations which allows for pathway oriented investigation. We present the data as a freely available community resource. A variety of data re-use options are described for computational modelling, proteomics assay design and bioengineering. Instrument data and analysis files are available at ProteomeXchange via the MassIVE partner repository under the identifiers PXD001860 and MSV000079053.
In mass spectrometry-based bottom-up proteomics, data-independent acquisition (DIA) is an emerging technique due to its comprehensive and unbiased sampling of precursor ions. However, current DIA methods use wide precursor isolation windows, resulting in co-fragmentation and complex mixture spectra. Thus, conventional database searching tools that identify peptides by interpreting individual MS/MS spectra are inherently limited in analyzing DIA data. Here we discuss an alternative approach, peptide-centric analysis, which tests directly for the presence and absence of query peptides. We discuss how peptide-centric analysis resolves some limitations of traditional spectrum-centric analysis, and we outline the unique characteristics of peptide-centric analysis in general.
Mass spectrometry (MS) is the main technology used in proteomics approaches. However, on average 75% of spectra analysed in an MS experiment remain unidentified. We propose to use spectrum clustering at a large-scale to shed a light on these unidentified spectra. PRoteomics IDEntifications database (PRIDE) Archive is one of the largest MS proteomics public data repositories worldwide. By clustering all tandem MS spectra publicly available in PRIDE Archive, coming from hundreds of datasets, we were able to consistently characterize three distinct groups of spectra: 1) incorrectly identified spectra, 2) spectra correctly identified but below the set scoring threshold, and 3) truly unidentified spectra. Using a multitude of complementary analysis approaches, we were able to identify less than 20% of the consistently unidentified spectra. The complete spectrum clustering results are available through the new version of the PRIDE Cluster resource (http://www.ebi.ac.uk/pride/cluster). This resource is intended, among other aims, to encourage and simplify further investigation into these unidentified spectra.