Journal: Molecular & cellular proteomics : MCP
Proteins endogenously secreted by human embryonic stem cells (hESCs) and those present in hESC culture medium are critical regulators of hESC self-renewal and differentiation. Current MS-based approaches for identifying secreted proteins rely predominantly on MS analysis of cell culture supernatants. Here we show that targeted proteomics of secretory pathway organelles is a powerful alternate approach for interrogating the cellular secretome. We have developed procedures to obtain subcellular fractions from mouse embryonic fibroblasts (MEFs) and hESCs that are enriched in secretory pathway organelles while ensuring retention of the secretory cargo. MS analysis of these fractions from hESCs cultured in MEF conditioned medium (MEF-CM) or MEFs exposed to hESC medium revealed 99 and 129 proteins putatively secreted by hESCs and MEFs, respectively. Of these, 53 and 62 proteins have been previously identified in cell culture supernatants of MEFs and hESCs, respectively, thus establishing the validity of our approach. Furthermore, 76 and 37 putatively secreted proteins identified in this study in MEFs and hESCs, respectively, have not been reported in previous MS analyses. The identification of low abundance secreted proteins via MS analysis of cell culture supernatants typically necessitates the use of altered culture conditions such as serum-free medium. However, an altered medium formulation might directly influence the cellular secretome. Indeed, we observed significant differences between the abundances of several secreted proteins in subcellular fractions isolated from hESCs cultured in MEF-CM and those exposed to unconditioned hESC medium for 24 h. In contrast, targeted proteomics of secretory pathway organelles does not require the use of customized media. We expect that our approach will be particularly valuable in two contexts highly relevant to hESC biology: obtaining a temporal snapshot of proteins secreted in response to a differentiation trigger, and identifying proteins secreted by cells that are isolated from a heterogeneous population.
The primary structural information of proteins employed as biotherapeutics is essential if one wishes to understand their structure-function relationship, as well as in the rational design of new therapeutics and for quality control. Given both the large size (around 150 kDa) and the structural complexity of intact immunoglobulin G (IgG), which includes a variable number of disulfide bridges, its extensive fragmentation and subsequent sequence determination by means of tandem mass spectrometry (MS) are challenging. Here, we applied electron transfer dissociation (ETD), implemented on a hybrid Orbitrap Fourier transform mass spectrometer (FTMS), to analyze a commercial recombinant IgG in a liquid chromatography (LC)-tandem mass spectrometry (MS/MS) top-down experiment. The lack of sensitivity typically observed during the top-down MS of large proteins was addressed by averaging time-domain transients recorded in different LC-MS/MS experiments before performing Fourier transform signal processing. The results demonstrate that an improved signal-to-noise ratio, along with the higher resolution and mass accuracy provided by Orbitrap FTMS (relative to previous applications of top-down ETD-based proteomics on IgG), is essential for comprehensive analysis. Specifically, ETD on Orbitrap FTMS produced about 33% sequence coverage of an intact IgG, signifying an almost 2-fold increase in IgG sequence coverage relative to prior ETD-based analysis of intact monoclonal antibodies of a similar subclass. These results suggest the potential application of the developed methodology to other classes of large proteins and biomolecules.
Glioblastoma multiforme (GBM) is a malignant primary brain tumor with a mean survival of 15 months with the current standard of care. Genetic profiling efforts have identified the amplification, overexpression, and mutation of the wild-type (wt) epidermal growth factor receptor tyrosine kinase (EGFR) in ∼50% of GBM patients. The genetic aberration of wtEGFR is frequently accompanied by the overexpression of a mutant EGFR known as EGFR variant III (EGFRvIII, de2-7EGFR, ΔEGFR), which is expressed in 30% of GBM tumors. The molecular mechanisms of tumorigenesis driven by EGFRvIII overexpression in human tumors have not been fully elucidated. To identify specific therapeutic targets for EGFRvIII driven tumors, it is important to gather a broad understanding of EGFRvIII specific signaling. Here, we have characterized signaling through the quantitative analysis of protein expression and tyrosine phosphorylation across a panel of glioblastoma tumor xenografts established from patient surgical specimens expressing wtEGFR or overexpressing wtEGFR (wtEGFR+) or EGFRvIII (EGFRvIII+). S100A10 (p11), major vault protein, guanylate-binding protein 1(GBP1), and carbonic anhydrase III (CAIII) were identified to have significantly increased expression in EGFRvIII expressing xenograft tumors relative to wtEGFR xenograft tumors. Increased expression of these four individual proteins was found to be correlated with poor survival in patients with GBM; the combination of these four proteins represents a prognostic signature for poor survival in gliomas. Integration of protein expression and phosphorylation data has uncovered significant heterogeneity among the various tumors and has highlighted several novel pathways, related to EGFR trafficking, activated in glioblastoma. The pathways and proteins identified in these tumor xenografts represent potential therapeutic targets for this disease.
Although bulk protein turnover has been measured with the use of stable isotope labeled tracers for over half a century, it is only recently that the same approach has become applicable to the level of the proteome, permitting analysis of the turnover of many proteins instead of single proteins or an aggregated protein pool. The optimal experimental design for turnover studies is dependent on the nature of the biological system under study, which dictates the choice of precursor label, protein pool sampling strategy, and treatment of data. In this review we discuss different approaches and, in particular, explore how complexity in experimental design and data processing increases as we shift from unicellular to multicellular systems, in particular animals.
Hydrogen-deuterium exchange mass spectrometry (HDX-MS) is an important method for protein structure-function analysis. The bottom-up approach uses protein digestion to localize deuteration to higher resolution, and the essential measurement involves centroid mass determinations on a very large set of peptides. In the course of evaluating systems for various projects, we established two HDX-MS platforms that consisted of an FT-MS and a high-resolution QTOF mass spectrometer, each with matched front-end fluidic systems. Digests of proteins spanning a 20-110kDa range were deuterated to equilibrium, and figures-of-merit for a typical bottom-up HDX-MS experiment were compared for each platform. The Orbitrap Velos identified 64% more peptides than the 5600 QTOF, with a 42% overlap between the two systems, independent of protein size. Precision in deuterium measurements using the Orbitrap marginally exceeded that of the QTOF, depending on the Orbitrap resolution setting. However, the unique nature of FT-MS data generates situations where deuteration measurements can be inaccurate, due to destructive interference arising from mismatches in elemental mass defects. This is shown through the analysis of the peptides common to both platforms, where deuteration values can be as low as 35% of the expected values, depending on FT-MS resolution, peptide length and charge state. These findings are supported by simulations of Orbitrap transients, and highlight that caution should be exercised in deriving centroid mass values from FT transients that do not support baseline separation of the full isotopic composition.
CCone snails produce highly complex venom comprising mostly small biologically active peptides known as conotoxins or conopeptides. Early estimates that suggested 50-200 venom peptides are produced per species have been recently increased at least 10-fold using advanced mass spectrometry. To uncover the mechanism(s) responsible for generating this impressive diversity, we used an integrated approach combining second-generation transcriptome sequencing with high sensitivity proteomics. From the venom gland transcriptome of Conus marmoreus, a total of 105 conopeptide precursor sequences from 13 gene superfamilies were identified. Over 60% of these precursors belonged to the three gene superfamilies O1, T and M, consistent with their high levels of expression, which suggests these conotoxins play an important role in prey capture and/or defense. Seven gene superfamilies not previously identified in C. marmoreus, including 5 novel superfamilies, were also discovered. To confirm the expression of toxins identified at the transcript level, the injected venom of C. marmoreus was comprehensively analyzed by mass spectrometry, revealing 2710 and 3172 peptides using MALDI and ESI-MS, respectively, and 6254 peptides using an ESI-MS TripleTOF 5600 instrument. All conopeptides derived from transcriptomic sequences could be matched to masses obtained on the TripleTOF within 100 ppm accuracy, with 66 (63%) providing MS/MS coverage that unambiguously confirmed these matches. Comprehensive integration of transcriptomic and proteomic data revealed for the first time that the vast majority of the conopeptide diversity arises from a more limited set of genes through a process of variable peptide processing, which generates conopeptides with alternative cleavage sites, heterogeneous post-translational modifications, and highly variable N- and C-terminal truncations. Variable peptide processing is expected to contribute to the evolution of venoms, and explains how a limited set of ~100 gene transcripts can generate thousands of conopeptides in a single species of cone snail.
Despite the growing importance of mass spectrometry in biomedical sciences, the metrics used to evaluate identified proteins need improvement. We present the nonparametric cutout index (npCI), a simple, robust extension of the target-decoy approach that reliably evaluates identifications without parameters. It can be used with multiple existing peptide scores and protein identification methods, to choose non-arbitrary FDR thresholds, and to evaluate strategies for merging evidence across replicate experiments.
Co-expression of mRNAs under multiple conditions is commonly used to infer co-functionality of their gene products despite well-known limitations of this “guilt-by-association” (GBA) approach. Recent advancements in mass spectrometry-based proteomic technologies have enabled global expression profiling at the protein level; however, whether proteome profiling data can outperform transcriptome profiling data for co-expression based gene function prediction has not been systematically investigated. Here, we address this question by constructing and analyzing mRNA and protein co-expression networks for three cancer types with matched mRNA and protein profiling data from The Cancer Genome Atlas (TCGA) and the Clinical Proteomic Tumor Analysis Consortium (CPTAC). Our analyses revealed a marked difference in wiring between the mRNA and protein co-expression networks. Whereas protein co-expression was driven primarily by functional similarity between co-expressed genes, mRNA co-expression was driven by both co-function and chromosomal co-localization of the genes. Functionally coherent mRNA modules were more likely to have their edges preserved in corresponding protein networks than functionally incoherent mRNA modules. Proteomic data strengthened the link between gene expression and function for at least 75% of Gene Ontology (GO) biological processes and 90% of KEGG pathways. A web application Gene2Net (http://cptac.gene2net.org) developed based on the three protein co-expression networks revealed novel gene-function relationships, such as linking ERBB2 (HER2) to lipid biosynthetic process in breast cancer, identifying PLG as a new gene involved in complement activation, and identifying AEBP1 as a new epithelial-mesenchymal transition (EMT) marker. Our results demonstrate that proteome profiling outperforms transcriptome profiling for co-expression based gene function prediction. Proteomics should be integrated if not preferred in gene function and human disease studies.
To facilitate genome-based representation and analysis of proteomics data, we developed a new bioinformatics framework, proBAMsuite, in which a central component is the protein BAM (proBAM) file format for organizing peptide spectrum matches (PSMs) within the context of the genome. proBAMsuite also includes two R packages, proBAMr and proBAMtools, for generating and analyzing proBAM files, respectively. Applying proBAMsuite to three recently published proteomics datasets, we demonstrated its utility in facilitating efficient genome-based sharing, interpretation and integration of proteomics data. First, the interpretation of proteomics data is significantly enhanced with the rich genomic annotation information. Second, PSMs can be easily re-annotated using user-specified gene annotation schemes and assembled into both protein and gene identifications. Third, using the genome as a common reference, proBAMsuite facilitates seamless proteomics and proteogenomics data integration. Finally, proBAM files can be readily visualized in genome browsers and thus bring proteomics data analysis to a general audience beyond the proteomics community. Results from this study establish proBAMsuite as a useful bioinformatics framework for proteomics and proteogenomics research.
To gain insight into the response of mulberry to phytoplasma-infection, the expression profiles of mRNAs and proteins in mulberry phloem sap were examined. A total of 955 unigenes and 136 proteins were found to be differentially expressed between the healthy and infected phloem sap. These differentially expressed mRNAs and proteins are involved in signalling, hormone metabolism, stress responses, etc. Interestingly, we found that both the mRNA and protein levels of the major latex protein-like 329 (MuMLPL329) gene were increased in the infected phloem saps. Expression of the MuMLPL329 gene was induced by pathogen inoculation and was responsive to jasmonic acid. Ectopic expression of MuMLPL329 in Arabidopsis enhances transgenic plant resistance to Botrytis cinerea, Pseudomonas syringae pv tomato DC3000 (Pst. DC3000) and phytoplasma. Further analysis revealed that MuMLPL329 can enhance the expression of some defense genes and might be involved in altering flavonoid content resulting in increased resistance of plants to pathogen infection. Finally, the roles of the differentially expressed mRNAs and proteins and the potential molecular mechanisms of their changes were discussed. It was likely that the phytoplasma-responsive mRNAs and proteins in the phloem saps were involved in multiple pathways of mulberry responses to phytoplasma-infection, and their changes may be partially responsible for some symptoms in the phytoplasma infected plants.