Concept: Shotgun proteomics
There is interest in extending bottom-up proteomics to the smallest possible sample size. We investigated the performance of two modern mass spectrometers for the analysis of samples ranging from 1 ng to 1 µg of RAW 264.7 cell lysate digests.
High-throughput identification of proteins with the latest generation of hybrid high-resolution mass spectrometers is opening new perspectives in microbiology. I present, here, an overview of tandem mass spectrometry technology and bioinformatics for shotgun proteomics that make 2D-PAGE approaches obsolete. Non-labelling quantitative approaches have become more popular than labelling techniques on most proteomic platforms because they are easier to carry out while their quantitative outcome is rather robust. Parameters for recording mass spectrometry data, however, need to be chosen carefully and statistics to assess the confidence of the results should not be neglected. Interestingly, next-generation sequencing methodologies make any microbial model quickly amenable to proteomics, leading to the documentation of a wide range of organisms from diverse environments. Some recent discoveries made using microbial proteomics have challenged some biological dogma, such as: (i) initiation of the translation does not occur predominantly from ATG codons in some microorganisms, (ii) non-canonical initiation codons are used to regulate the production of specific but important proteins and (iii) a gene may code for multiple polypeptide species, heterogeneous in terms of sequences. Microbial diversity and microbial physiology can now be revisited by means of exhaustive comparative proteomic surveys where thousands of proteins are detected and quantified. Proteogenomics, consisting of better annotating of genomes with the help of proteomic evidence, is paving the way for integrated multi-omic approaches in microbiology. Finally, meta-proteomic tools and approaches are emerging for tackling the high complexity of the microbial world as a whole, opening new perspectives for assessing how microbial communities function.
Protein digestion is an integral part of the ‘shotgun’ proteomics approach and commonly requires overnight incubation prior to mass spectrometry analysis. Quadruplicate ‘shotgun’ proteomic analysis of whole yeast lysate demonstrated that Guanidine-Hydrochloride (Gnd-HCl) protein digestion can be optimally completed within 30 minutes with endoprotease Lys-C. No chemical artifacts were introduced when samples were incubated in Gnd-HCl at 95°C, making Gnd-HCl an appropriate digestion buffer for shotgun proteomics. Current methodologies for investigating protein-protein interactions (PPIs) often require several preparation steps, which prolongs any parallel operation and high-throughput interaction analysis. Gnd-HCl allow the efficient elution and subsequent fast digestion of PPIs to provide a convenient high-throughput methodology for affinity-purification mass spectrometry (AP-MS) experiments. To validate the Gnd-HCl approach, label-free PPI analysis of several GFP-tagged yeast deubiquitinating enzymes was performed. The identification of known interaction partners demonstrates the utility of the optimized Gnd-HCl protocol that is also scalable to the 96 well-plate format.
We present an updated version of the TFold software for pinpointing differentially expressed proteins in shotgun proteomics experiments. Given an FDR bound, the updated approach uses a theoretical FDR estimator to maximize the number of identifications that satisfy both a fold-change cutoff that varies with the t-test P-value as a power law and a stringency criterion that aims to detect lowly abundant proteins. The new version has yielded significant improvements in sensitivity over the previous one. AVAILABILITY: Freely available for academic use at http://pcarvalho.com/patternlab.
This Data Descriptor announces the submission to public repositories of the PNNL Biodiversity Library, a large collection of global proteomics data for 112 bacterial and archaeal organisms. The data comprises 35,162 tandem mass spectrometry (MS/MS) datasets from ~10 years of research. All data has been searched, annotated and organized in a consistent manner to promote reuse by the community. Protein identifications were cross-referenced with KEGG functional annotations which allows for pathway oriented investigation. We present the data as a freely available community resource. A variety of data re-use options are described for computational modelling, proteomics assay design and bioengineering. Instrument data and analysis files are available at ProteomeXchange via the MassIVE partner repository under the identifiers PXD001860 and MSV000079053.
Bottom-up proteomics relies on the use of proteases and is the method of choice for identifying thousands of protein groups in complex samples. Top-down proteomics has been shown to be robust for direct analysis of small proteins and offers a solution to the “peptide-to-protein” inference problem inherent with bottom-up. Here, we describe the first large-scale integration of genomic, bottom-up and top-down proteomic data for comparative analysis of patient-derived mouse xenograft models of basal and luminal B human breast cancer, WHIM2 and WHIM16, respectively. Using these well-characterized xenograft models established by the National Cancer Institute Clinical Proteomic Tumor Analysis Consortium, we compared and contrasted the performance of bottom-up and top-down proteomics to detect cancer-specific aberrations at the peptide and proteoform levels, and to measure differential expression of proteins and proteoforms. Bottom-up proteomics analysis of the tumor xenografts detected almost 10 times as many coding nucleotide polymorphisms and peptides resulting from novel splice junctions than top-down. For proteins in the range of 0-30 kDa, bottom-up proteomics quantified 3,519 protein groups from 49,185 peptides, while top-down proteomics quantified 982 proteoforms mapping to 358 proteins. Examples of both concordant and discordant quantitation were found in an approximately 60:40 ratio, providing a unique opportunity for top-down to fill in missing information. The two techniques showed complementary performance, with bottom-up yielding 8 times more identifications of 0-30 kDa proteins in xenograft proteomes, but failing to detect differences in post-translational modifications, such as phosphorylation pattern changes of alpha-endosulfine. This work illustrates the potency of a combined bottom-up and top-down proteomics approach to deepen our knowledge of cancer biology, especially when genomic data are available.
In mass spectrometry-based bottom-up proteomics, data-independent acquisition (DIA) is an emerging technique due to its comprehensive and unbiased sampling of precursor ions. However, current DIA methods use wide precursor isolation windows, resulting in co-fragmentation and complex mixture spectra. Thus, conventional database searching tools that identify peptides by interpreting individual MS/MS spectra are inherently limited in analyzing DIA data. Here we discuss an alternative approach, peptide-centric analysis, which tests directly for the presence and absence of query peptides. We discuss how peptide-centric analysis resolves some limitations of traditional spectrum-centric analysis, and we outline the unique characteristics of peptide-centric analysis in general.
A crucial component of the analysis of shotgun proteomics datasets is the search engine, an algorithm that attempts to identify the peptide sequence from the parent molecular ion that produced each fragment ion spectrum in the dataset. There are many different search engines, both commercial and open source, each employing a somewhat different technique for spectrum identification. Using these search engines, a resultant set of high scoring peptide-spectrum matches for a defined set of input spectra differs markedly among the various search engine results; individual engines each provide unique correct identifications amongst a core set of correlative identifications. This has led to the approach of combining the results from multiple search engines to achieve improved analysis of each dataset. Here we review the techniques and available software for combining the results of multiple search engines and briefly compare the relative performance of these techniques.
Mass spectrometry (MS)-based proteomics workflows can crudely be classified into two distinct regimes, either targeting relatively small peptides (i.e. 0.7 kDa < Mw < 3.0 kDa) or small to medium sized intact proteins (i.e. 10 kDa < Mw < 30 kDa), respectively termed bottom-up and top-down proteomics. Recently, a niche has started to be explored covering the analysis of middle-range peptides (i.e. 3.0 kDa < Mw < 10 kDa), aptly termed middle-down proteomics. Although middle-down proteomics can follow, in principle, a modular workflow similiar to that of bottom-up proteomics, we hypothesized that each of these modules would benefit from targeted optimization to improve its overall performance in the analysis of middle-range sized peptides. Hence, to generate middle-range sized peptides from cellular lysates we explored the use of the proteases Asp-N and Glu-C, and a non-enzymatic acid induced cleavage. To increase the depth of the proteome, an SCX separation, carefully tuned to improve the separation of longer peptides, combined with RP-LC using columns packed with material possessing a larger pore size were used. Finally, after evaluating the combination of potentially beneficial MS settings, we also assessed the peptide fragmentation techniques HCD, ETD and EThcD for characterization of middle-range sized peptides. These combined improvements clearly improve the detection and sequence coverage of middle-range peptides and should guide researchers to explore further how middle-down proteomics may lead to an improved proteome coverage, beneficial for, amongst other things, the enhanced analysis of (co-occurring) post-translational modifications.
Fewer than half of all tandem mass spectrometry (MS/MS) spectra acquired in shotgun proteomics experiments are typically matched to a peptide with high confidence. Here we determine the identity of unassigned peptides using an ultra-tolerant Sequest database search that allows peptide matching even with modifications of unknown masses up to ± 500 Da. In a proteome-wide data set on HEK293 cells (9,513 proteins and 396,736 peptides), this approach matched an additional 184,000 modified peptides, which were linked to biological and chemical modifications representing 523 distinct mass bins, including phosphorylation, glycosylation and methylation. We localized all unknown modification masses to specific regions within a peptide. Known modifications were assigned to the correct amino acids with frequencies >90%. We conclude that at least one-third of unassigned spectra arise from peptides with substoichiometric modifications.