Concept: Protein Data Bank
Chemical cross-links identified by mass spectrometry generate distance restraints that reveal low-resolution structural information on proteins and protein complexes. The technology to reliably generate such data has become mature and robust enough to shift the focus to the question of how these distance restraints can be best integrated into molecular modeling calculations. Here, we introduce three workflows for incorporating distance restraints generated by chemical cross-linking and mass spectrometry into ROSETTA protocols for comparative and de novo modeling and protein-protein docking. We demonstrate that the cross-link validation and visualization software Xwalk facilitates successful cross-link data integration. Besides the protocols we introduce XLdb, a database of chemical cross-links from 14 different publications with 506 intra-protein and 62 inter-protein cross-links, where each cross-link can be mapped on an experimental structure from the Protein Data Bank. Finally, we demonstrate on a protein-protein docking reference data set the impact of virtual cross-links on protein docking calculations and show that an inter-protein cross-link can reduce on average the RMSD of a docking prediction by 5.0 Å. The methods and results presented here provide guidelines for the effective integration of chemical cross-link data in molecular modeling calculations and should advance the structural analysis of particularly large and transient protein complexes via hybrid structural biology methods.
Human telomeres play a key role in protecting chromosomal ends from fusion events; they are composed of d(TTAGGG) repeats, ranging in size from 3 to 15 kb. They form G-quadruplex DNA structures, stabilized by G-quartets in the presence of cations, and are involved in several biological processes. In particular, a telomere maintenance mechanism is provided by a specialized enzyme called telomerase, a reverse transcriptase able to add multiple copies of the 5'-GGTTAG-3' motif to the end of the G-strand of the telomere and which is over-expressed in the majority of cancer cells. The central cation has a crucial role in maintaining the stability of the structure. Based on its nature, it can be associated with different topological telomeric quadruplexes, which depend also on the orientation of the DNA strands and the syn/anti conformation of the guanines. Such a polymorphism, confirmed by the different structures deposited in the Protein Data Bank (PDB), prompted us to apply a computational protocol in order to investigate the conformational properties of a set of known G-quadruplex ligands and their molecular recognition against six different experimental models of the human telomeric sequence d[AG3(T2AG3)3]. The average AutoDock correlation between theoretical and experimental data yielded an r(2) value equal to 0.882 among all the studied models. Such a result was always improved with respect to those of the single folds, with the exception of the parallel structure (r(2) equal to 0.886), thus suggesting a key role of this G4 conformation in the stacking interaction network. Among the studied binders, a trisubstituted acridine and a dibenzophenanthroline derivative were well recognized by the parallel and the mixed G-quadruplex structures, allowing the identification of specific key contacts with DNA and the further design of more potent or target specific G-quadruplex ligands.
For the past few decades, intensive studies have been carried out in an attempt to understand how the amino acid sequences of proteins encode their three dimensional structures to perform their specific functions. In order to understand the sequence-structure relationship of proteins, several sub-sequence search studies in non-redundant sequence-structure databases have been undertaken which have given some fruitful clues. In our earlier work, we analyzed a set of 3124 non-redundant protein sequences from the Protein Data Bank (PDB) and retrieved 30 identical octapeptides having different secondary structure. These octapeptides were characterized by using different computational procedures. This prompted us to explore the presence of octapeptides with reverse sequences and to analyze whether these octapeptides would adopt similar structures as that of their parent octapeptides. Our identical reverse octapeptide search resulted in the finding of eight octapeptide pairs (octapeptide and reverse octapeptide) with similar secondary structure and 23 octapeptide pairs with different secondary structure. In the present work, the geometrical and biophysical characteristics of identical reverse octapeptides were explored and compared with unrelated octapeptide pairs by using various computational tools. We thus conclude that proteins containing identical reverse octapeptides are not very abundant and residues in the octapeptide pairs do not contribute to the stability of the protein. Furthermore, compared to unrelated octapeptides, identical reverse octapeptides do not show certain biophysical and geometrical properties.
Modeling the three-dimensional (3D) structures of proteins assumes great significance because of its manifold applications in biomolecular research. Toward this goal, we present MaxMod, a graphical user interface (GUI) of the MODELLER program that combines profile hidden Markov model (profile HMM) method with Clustal Omega program to significantly improve the selection of homologous templates and target-template alignment for construction of accurate 3D protein models. MaxMod distinguishes itself from other existing GUIs of MODELLER software by implementing effortless modeling of proteins using templates that bear modified residues. Additionally, it provides various features such as loop optimization, express modeling (a feature where protein model can be generated directly from its sequence, without any further user intervention) and automatic update of PDB database, thus enhancing the user-friendly control of computational tasks. We find that HMM-based MaxMod performs better than other modeling packages in terms of execution time and model quality. MaxMod is freely available as a downloadable standalone tool for academic and non-commercial purpose at http://www.immt.res.in/maxmod/ .
We generated an anti-albumin antibody, CA645, to link its Fv domain to an antigen-binding fragment (Fab), thereby extending the serum half-life of the Fab. CA645 was demonstrated to bind human, cynomolgus, and mouse serum albumin with similar affinity (1-7 nM), and to bind human serum albumin (HSA) when it is in complex with common known ligands. Importantly for half-life extension, CA645 binds HSA with similar affinity within the physiologically relevant range of pH 5.0 - pH 7.4, and does not have a deleterious effect on the binding of HSA to neonatal Fc receptor (FcRn). A crystal structure of humanized CA645 Fab in complex with HSA was solved and showed that CA645 Fab binds to domain II of HSA. Superimposition with the crystal structure of FcRn bound to HSA confirmed that CA645 does not block HSA binding to FcRn. In mice, the serum half-life of humanized CA645 Fab is 84.2 h. This is a significant extension in comparison with < 1 h for a non-HSA binding CA645 Fab variant. The Fab-HSA structure was used to design a series of mutants with reduced affinity to investigate the correlation between the affinity for albumin and serum half-life. Reduction in the affinity for MSA by 144-fold from 2.2 nM to 316 nM had no effect on serum half-life. Strikingly, despite a reduction in affinity to 62 µM, an extension in serum half-life of 26.4 h was still obtained. CA645 Fab and the CA645 Fab-HSA complex have been deposited in the Protein Data Bank (PDB) with accession codes, 5FUZ and 5FUO, respectively.
PRO: tein S: tructure A: nnotation T: ool-plus (ProSAT(+)) is a new web server for mapping protein sequence annotations onto a protein structure and visualizing them simultaneously with the structure. ProSAT(+) incorporates many of the features of the preceding ProSAT and ProSAT2 tools but also provides new options for the visualization and sharing of protein annotations. Data are extracted from the UniProt KnowledgeBase, the RCSB PDB and the PDBe SIFTS resource, and visualization is performed using JSmol. User-defined sequence annotations can be added directly to the URL, thus enabling visualization and easy data sharing. ProSAT(+) is available at http://prosat.h-its.org.
Method to generate highly stable D-amino acid analogs of bioactive helical peptides using a mirror image of the entire PDB
- Proceedings of the National Academy of Sciences of the United States of America
- Published over 1 year ago
Biologics are a rapidly growing class of therapeutics with many advantages over traditional small molecule drugs. A major obstacle to their development is that proteins and peptides are easily destroyed by proteases and, thus, typically have prohibitively short half-lives in human gut, plasma, and cells. One of the most effective ways to prevent degradation is to engineer analogs from dextrorotary (D)-amino acids, with up to 105-fold improvements in potency reported. We here propose a general peptide-engineering platform that overcomes limitations of previous methods. By creating a mirror image of every structure in the Protein Data Bank (PDB), we generate a database of ∼2.8 million D-peptides. To obtain a D-analog of a given peptide, we search the (D)-PDB for similar configurations of its critical-“hotspot”-residues. As a proof of concept, we apply our method to two peptides that are Food and Drug Administration approved as therapeutics for diabetes and osteoporosis, respectively. We obtain D-analogs that activate the GLP1 and PTH1 receptors with the same efficacy as their natural counterparts and show greatly increased half-life.
2017 publication guidelines for structural modelling of small-angle scattering data from biomolecules in solution: an update
- Acta crystallographica. Section D, Structural biology
- Published almost 2 years ago
In 2012, preliminary guidelines were published addressing sample quality, data acquisition and reduction, presentation of scattering data and validation, and modelling for biomolecular small-angle scattering (SAS) experiments. Biomolecular SAS has since continued to grow and authors have increasingly adopted the preliminary guidelines. In parallel, integrative/hybrid determination of biomolecular structures is a rapidly growing field that is expanding the scope of structural biology. For SAS to contribute maximally to this field, it is essential to ensure open access to the information required for evaluation of the quality of SAS samples and data, as well as the validity of SAS-based structural models. To this end, the preliminary guidelines for data presentation in a publication are reviewed and updated, and the deposition of data and associated models in a public archive is recommended. These guidelines and recommendations have been prepared in consultation with the members of the International Union of Crystallography (IUCr) Small-Angle Scattering and Journals Commissions, the Worldwide Protein Data Bank (wwPDB) Small-Angle Scattering Validation Task Force and additional experts in the field.
Long, flexible physical filaments are naturally tangled and knotted, from macroscopic string down to long-chain molecules. The existence of knotting in a filament naturally affects its configuration and properties, and may be very stable or disappear rapidly under manipulation and interaction. Knotting has been previously identified in protein backbone chains, for which these mechanical constraints are of fundamental importance to their molecular functionality, despite their being open curves in which the knots are not mathematically well defined; knotting can only be identified by closing the termini of the chain somehow. We introduce a new method for resolving knotting in open curves using virtual knots, which are a wider class of topological objects that do not require a classical closure and so naturally capture the topological ambiguity inherent in open curves. We describe the results of analysing proteins in the Protein Data Bank by this new scheme, recovering and extending previous knotting results, and identifying topological interest in some new cases. The statistics of virtual knots in protein chains are compared with those of open random walks and Hamiltonian subchains on cubic lattices, identifying a regime of open curves in which the virtual knotting description is likely to be important.
Despite decades of work by structural biologists, there are still ~5200 protein families with unknown structure outside the range of comparative modeling. We show that Rosetta structure prediction guided by residue-residue contacts inferred from evolutionary information can accurately model proteins that belong to large families and that metagenome sequence data more than triple the number of protein families with sufficient sequences for accurate modeling. We then integrate metagenome data, contact-based structure matching, and Rosetta structure calculations to generate models for 614 protein families with currently unknown structures; 206 are membrane proteins and 137 have folds not represented in the Protein Data Bank. This approach provides the representative models for large protein families originally envisioned as the goal of the Protein Structure Initiative at a fraction of the cost.