Journal: Molecular & cellular proteomics : MCP
Proteins endogenously secreted by human embryonic stem cells (hESCs) and those present in hESC culture medium are critical regulators of hESC self-renewal and differentiation. Current MS-based approaches for identifying secreted proteins rely predominantly on MS analysis of cell culture supernatants. Here we show that targeted proteomics of secretory pathway organelles is a powerful alternate approach for interrogating the cellular secretome. We have developed procedures to obtain subcellular fractions from mouse embryonic fibroblasts (MEFs) and hESCs that are enriched in secretory pathway organelles while ensuring retention of the secretory cargo. MS analysis of these fractions from hESCs cultured in MEF conditioned medium (MEF-CM) or MEFs exposed to hESC medium revealed 99 and 129 proteins putatively secreted by hESCs and MEFs, respectively. Of these, 53 and 62 proteins have been previously identified in cell culture supernatants of MEFs and hESCs, respectively, thus establishing the validity of our approach. Furthermore, 76 and 37 putatively secreted proteins identified in this study in MEFs and hESCs, respectively, have not been reported in previous MS analyses. The identification of low abundance secreted proteins via MS analysis of cell culture supernatants typically necessitates the use of altered culture conditions such as serum-free medium. However, an altered medium formulation might directly influence the cellular secretome. Indeed, we observed significant differences between the abundances of several secreted proteins in subcellular fractions isolated from hESCs cultured in MEF-CM and those exposed to unconditioned hESC medium for 24 h. In contrast, targeted proteomics of secretory pathway organelles does not require the use of customized media. We expect that our approach will be particularly valuable in two contexts highly relevant to hESC biology: obtaining a temporal snapshot of proteins secreted in response to a differentiation trigger, and identifying proteins secreted by cells that are isolated from a heterogeneous population.
The primary structural information of proteins employed as biotherapeutics is essential if one wishes to understand their structure-function relationship, as well as in the rational design of new therapeutics and for quality control. Given both the large size (around 150 kDa) and the structural complexity of intact immunoglobulin G (IgG), which includes a variable number of disulfide bridges, its extensive fragmentation and subsequent sequence determination by means of tandem mass spectrometry (MS) are challenging. Here, we applied electron transfer dissociation (ETD), implemented on a hybrid Orbitrap Fourier transform mass spectrometer (FTMS), to analyze a commercial recombinant IgG in a liquid chromatography (LC)-tandem mass spectrometry (MS/MS) top-down experiment. The lack of sensitivity typically observed during the top-down MS of large proteins was addressed by averaging time-domain transients recorded in different LC-MS/MS experiments before performing Fourier transform signal processing. The results demonstrate that an improved signal-to-noise ratio, along with the higher resolution and mass accuracy provided by Orbitrap FTMS (relative to previous applications of top-down ETD-based proteomics on IgG), is essential for comprehensive analysis. Specifically, ETD on Orbitrap FTMS produced about 33% sequence coverage of an intact IgG, signifying an almost 2-fold increase in IgG sequence coverage relative to prior ETD-based analysis of intact monoclonal antibodies of a similar subclass. These results suggest the potential application of the developed methodology to other classes of large proteins and biomolecules.
Glioblastoma multiforme (GBM) is a malignant primary brain tumor with a mean survival of 15 months with the current standard of care. Genetic profiling efforts have identified the amplification, overexpression, and mutation of the wild-type (wt) epidermal growth factor receptor tyrosine kinase (EGFR) in ∼50% of GBM patients. The genetic aberration of wtEGFR is frequently accompanied by the overexpression of a mutant EGFR known as EGFR variant III (EGFRvIII, de2-7EGFR, ΔEGFR), which is expressed in 30% of GBM tumors. The molecular mechanisms of tumorigenesis driven by EGFRvIII overexpression in human tumors have not been fully elucidated. To identify specific therapeutic targets for EGFRvIII driven tumors, it is important to gather a broad understanding of EGFRvIII specific signaling. Here, we have characterized signaling through the quantitative analysis of protein expression and tyrosine phosphorylation across a panel of glioblastoma tumor xenografts established from patient surgical specimens expressing wtEGFR or overexpressing wtEGFR (wtEGFR+) or EGFRvIII (EGFRvIII+). S100A10 (p11), major vault protein, guanylate-binding protein 1(GBP1), and carbonic anhydrase III (CAIII) were identified to have significantly increased expression in EGFRvIII expressing xenograft tumors relative to wtEGFR xenograft tumors. Increased expression of these four individual proteins was found to be correlated with poor survival in patients with GBM; the combination of these four proteins represents a prognostic signature for poor survival in gliomas. Integration of protein expression and phosphorylation data has uncovered significant heterogeneity among the various tumors and has highlighted several novel pathways, related to EGFR trafficking, activated in glioblastoma. The pathways and proteins identified in these tumor xenografts represent potential therapeutic targets for this disease.
Although bulk protein turnover has been measured with the use of stable isotope labeled tracers for over half a century, it is only recently that the same approach has become applicable to the level of the proteome, permitting analysis of the turnover of many proteins instead of single proteins or an aggregated protein pool. The optimal experimental design for turnover studies is dependent on the nature of the biological system under study, which dictates the choice of precursor label, protein pool sampling strategy, and treatment of data. In this review we discuss different approaches and, in particular, explore how complexity in experimental design and data processing increases as we shift from unicellular to multicellular systems, in particular animals.
Hydrogen-deuterium exchange mass spectrometry (HDX-MS) is an important method for protein structure-function analysis. The bottom-up approach uses protein digestion to localize deuteration to higher resolution, and the essential measurement involves centroid mass determinations on a very large set of peptides. In the course of evaluating systems for various projects, we established two HDX-MS platforms that consisted of an FT-MS and a high-resolution QTOF mass spectrometer, each with matched front-end fluidic systems. Digests of proteins spanning a 20-110kDa range were deuterated to equilibrium, and figures-of-merit for a typical bottom-up HDX-MS experiment were compared for each platform. The Orbitrap Velos identified 64% more peptides than the 5600 QTOF, with a 42% overlap between the two systems, independent of protein size. Precision in deuterium measurements using the Orbitrap marginally exceeded that of the QTOF, depending on the Orbitrap resolution setting. However, the unique nature of FT-MS data generates situations where deuteration measurements can be inaccurate, due to destructive interference arising from mismatches in elemental mass defects. This is shown through the analysis of the peptides common to both platforms, where deuteration values can be as low as 35% of the expected values, depending on FT-MS resolution, peptide length and charge state. These findings are supported by simulations of Orbitrap transients, and highlight that caution should be exercised in deriving centroid mass values from FT transients that do not support baseline separation of the full isotopic composition.
CCone snails produce highly complex venom comprising mostly small biologically active peptides known as conotoxins or conopeptides. Early estimates that suggested 50-200 venom peptides are produced per species have been recently increased at least 10-fold using advanced mass spectrometry. To uncover the mechanism(s) responsible for generating this impressive diversity, we used an integrated approach combining second-generation transcriptome sequencing with high sensitivity proteomics. From the venom gland transcriptome of Conus marmoreus, a total of 105 conopeptide precursor sequences from 13 gene superfamilies were identified. Over 60% of these precursors belonged to the three gene superfamilies O1, T and M, consistent with their high levels of expression, which suggests these conotoxins play an important role in prey capture and/or defense. Seven gene superfamilies not previously identified in C. marmoreus, including 5 novel superfamilies, were also discovered. To confirm the expression of toxins identified at the transcript level, the injected venom of C. marmoreus was comprehensively analyzed by mass spectrometry, revealing 2710 and 3172 peptides using MALDI and ESI-MS, respectively, and 6254 peptides using an ESI-MS TripleTOF 5600 instrument. All conopeptides derived from transcriptomic sequences could be matched to masses obtained on the TripleTOF within 100 ppm accuracy, with 66 (63%) providing MS/MS coverage that unambiguously confirmed these matches. Comprehensive integration of transcriptomic and proteomic data revealed for the first time that the vast majority of the conopeptide diversity arises from a more limited set of genes through a process of variable peptide processing, which generates conopeptides with alternative cleavage sites, heterogeneous post-translational modifications, and highly variable N- and C-terminal truncations. Variable peptide processing is expected to contribute to the evolution of venoms, and explains how a limited set of ~100 gene transcripts can generate thousands of conopeptides in a single species of cone snail.
Co-expression of mRNAs under multiple conditions is commonly used to infer co-functionality of their gene products despite well-known limitations of this “guilt-by-association” (GBA) approach. Recent advancements in mass spectrometry-based proteomic technologies have enabled global expression profiling at the protein level; however, whether proteome profiling data can outperform transcriptome profiling data for co-expression based gene function prediction has not been systematically investigated. Here, we address this question by constructing and analyzing mRNA and protein co-expression networks for three cancer types with matched mRNA and protein profiling data from The Cancer Genome Atlas (TCGA) and the Clinical Proteomic Tumor Analysis Consortium (CPTAC). Our analyses revealed a marked difference in wiring between the mRNA and protein co-expression networks. Whereas protein co-expression was driven primarily by functional similarity between co-expressed genes, mRNA co-expression was driven by both co-function and chromosomal co-localization of the genes. Functionally coherent mRNA modules were more likely to have their edges preserved in corresponding protein networks than functionally incoherent mRNA modules. Proteomic data strengthened the link between gene expression and function for at least 75% of Gene Ontology (GO) biological processes and 90% of KEGG pathways. A web application Gene2Net (http://cptac.gene2net.org) developed based on the three protein co-expression networks revealed novel gene-function relationships, such as linking ERBB2 (HER2) to lipid biosynthetic process in breast cancer, identifying PLG as a new gene involved in complement activation, and identifying AEBP1 as a new epithelial-mesenchymal transition (EMT) marker. Our results demonstrate that proteome profiling outperforms transcriptome profiling for co-expression based gene function prediction. Proteomics should be integrated if not preferred in gene function and human disease studies.
To facilitate genome-based representation and analysis of proteomics data, we developed a new bioinformatics framework, proBAMsuite, in which a central component is the protein BAM (proBAM) file format for organizing peptide spectrum matches (PSMs) within the context of the genome. proBAMsuite also includes two R packages, proBAMr and proBAMtools, for generating and analyzing proBAM files, respectively. Applying proBAMsuite to three recently published proteomics datasets, we demonstrated its utility in facilitating efficient genome-based sharing, interpretation and integration of proteomics data. First, the interpretation of proteomics data is significantly enhanced with the rich genomic annotation information. Second, PSMs can be easily re-annotated using user-specified gene annotation schemes and assembled into both protein and gene identifications. Third, using the genome as a common reference, proBAMsuite facilitates seamless proteomics and proteogenomics data integration. Finally, proBAM files can be readily visualized in genome browsers and thus bring proteomics data analysis to a general audience beyond the proteomics community. Results from this study establish proBAMsuite as a useful bioinformatics framework for proteomics and proteogenomics research.
Control of protein homeostasis is fundamental to the health and longevity of all organisms. Because the rate of protein synthesis by ribosomes is a central control point in this process, regulation and maintenance of ribosome function could have amplified importance in the overall regulatory circuit. Indeed, ribosomal defects are commonly associated with loss of protein homeostasis, aging and disease, whereas improved protein homeostasis, implying optimal ribosomal function, is associated with disease resistance and increased lifespan. To maintain a high quality ribosome population within the cell, dysfunctional ribosomes are targeted for autophagic degradation. It is not known if complete degradation is the only mechanism for eukaryotic ribosome maintenance or if they might also be repaired by replacement of defective components. We used stable-isotope feeding and protein mass-spectrometry to measure the kinetics of turnover of ribosomal RNA (rRNA) and 71 ribosomal proteins (r-proteins) in mice. The results indicate that exchange of individual proteins and whole ribosome degradation both contribute to ribosome maintenance in vivo. In general, peripheral r-proteins and those with more direct roles in peptide-bond formation are replaced multiple times during the lifespan of the assembled structure, presumably by exchange with a free cytoplasmic pool, whereas the majority of r-proteins are stably incorporated for the lifetime of the ribosome. Dietary signals impact the rates of both new ribosome assembly and component exchange. Signal-specific modulation of ribosomal repair and degradation could provide a mechanistic link in the frequently observed associations among diminished rates of protein synthesis, increased autophagy, and greater longevity.
An important motivation for the construction of biobanks is to discover biomarkers that identify diseases at early, potentially curable stages. This will require biobanks from large numbers of individuals, preferably sampled repeatedly, where the samples are collected and stored under conditions that preserve potential biomarkers. Dried blood samples are attractive for biobanking because of the ease and low cost of collection and storage. Here we have investigated their suitability for protein measurements. 92 proteins with relevance for oncology were analyzed using multiplex proximity extension assays (PEA) in dried blood spots collected on paper and stored for up to 30 years at either +4°C or -24°C.
Our main findings were that 1) the act of drying only slightly influenced detection of blood proteins (average correlation of 0.970), and in a reproducible manner (correlation of 0.999), 2) detection of some proteins was not significantly affected by storage over the full range of three decades (34% and 76% of the analyzed proteins at +4°C and -24°C, respectively), while levels of others decreased slowly during storage with half-lives in the range of 10 to 50 years, and 3) detectability of proteins was less affected in dried samples stored at -24°C compared to at +4°C, as the median protein abundance had decreased to 80% and 93% of starting levels after 10 years of storage at +4°C or -24°C, respectively. The results of our study are encouraging as they suggest an inexpensive means to collect large numbers of blood samples, even by the donors themselves, and to transport, and store biobanked samples as spots of whole blood dried on paper. Combined with emerging means to measure hundreds or thousands of protein, such biobanks could prove of great medical value by greatly enhancing discovery as well as routine analysis of blood biomarkers.