Concept: Binding site
Comparison of the binding sites of proteins is an effective means for predicting protein functions based on their structure information. Despite the importance of this problem and much research in the past, it is still very challenging to predict the binding ligands from the atomic structures of protein binding sites. Here, we designed a new algorithm, TIPSA (Triangulation-based Iterative-closest-point for Protein Surface Alignment), based on the iterative closest point (ICP) algorithm. TIPSA aims to find the maximum number of atoms that can be superposed between two protein binding sites, where any pair of superposed atoms has a distance smaller than a given threshold. The search starts from similar tetrahedra between two binding sites obtained from 3D Delaunay triangulation and uses the Hungarian algorithm to find additional matched atoms. We found that, due to the plasticity of protein binding sites, matching the rigid body of point clouds of protein binding sites is not adequate for satisfactory binding ligand prediction. We further incorporated global geometric information, the radius of gyration of binding site atoms, and used nearest neighbor classification for binding site prediction. Tested on benchmark data, our method achieved a performance comparable to the best methods in the literature, while simultaneously providing the common atom set and atom correspondences.
DNA is an important target for the treatment of multiple pathologies, most notably cancer. In particular, DNA intercalators have often been used as anti-cancer drugs. However, despite their relevance to drug discovery, only a few systematic computational studies were performed on DNA-intercalator complexes. In this work we have analyzed ligand binding sites preferences in 63 high resolution DNA-intercalator complexes available in the PDB and found that ligands bind preferentially between G and C and between the C and A base pairs (70% and 11% respectively). Next, we examined the ability of AUTODOCK to accurately dock ligands into pre-formed intercalation sites. Following the optimization of the docking protocol, AUTODOCK was able to generate conformations with RMSD values < 2.00 Å with respect to crystal structures in ~80% of the cases while focusing on the pre-formed binding site (small grid box) or on the entire DNA structure (large grid box). In addition, a top ranked conformation with an RMSD < 2.00 Å was identified in 75% and 60% of the cases using small and large docking boxes respectively. Moreover, under the large docking box setting AUTODOCK was able to successfully distinguish between the intercalation site and the minor groove site. However, in all cases the crystal structures and poses tightly clustered around it, had a lower score than the best scoring poses suggesting a potential scoring problem with AUTODOCK. A close examination of all cases where the top ranked pose had an RMSD value > 2.00 Å suggests that AUTODOCK may over emphasize the hydrogen bonding term. A decision tree was built to identify ligands which are likely to be accurately docked based on their characteristics. This analysis revealed that AUTODOCK performs best for intercalators characterized by a large number of aromatic rings, low flexibility, high molecular weight and a small number of hydrogen bond acceptors. Finally, for canonical B-DNA structures (where pre-formed sites are unavailable), we demonstrated that intercalation sites could be formed by inserting an anthracene moiety between the (anticipated) site-flanking base pairs and by relaxing the structure using either energy minimization or preferably molecular dynamics simulations. Such sites were suitable for the docking of different intercalators by AUTODOCK.
FireDB (http://firedb.bioinfo.cnio.es) is a curated inventory of catalytic and biologically relevant small ligand-binding residues culled from the protein structures in the Protein Data Bank. Here we present the important new additions since the publication of FireDB in 2007. The database now contains an extensive list of manually curated biologically relevant compounds. Biologically relevant compounds are informative because of their role in protein function, but they are only a small fraction of the entire ligand set. For the remaining ligands, the FireDB provides cross-references to the annotations from publicly available biological, chemical and pharmacological compound databases. FireDB now has external references for 95% of contacting small ligands, making FireDB a more complete database and providing the scientific community with easy access to the pharmacological annotations of PDB ligands. In addition to the manual curation of ligands, FireDB also provides insights into the biological relevance of individual binding sites. Here, biological relevance is calculated from the multiple sequence alignments of related binding sites that are generated from all-against-all comparison of each FireDB binding site. The database can be accessed by RESTful web services and is available for download via MySQL.
Integrin αvβ3 expression is altered in various diseases and has been proposed as a drug target. Here we use a rational design approach to develop a therapeutic protein, which we call ProAgio, that binds to integrin αvβ3 outside the classical ligand-binding site. We show ProAgio induces apoptosis of integrin αvβ3-expressing cells by recruiting and activating caspase 8 to the cytoplasmic domain of integrin αvβ3. ProAgio also has anti-angiogenic activity and strongly inhibits growth of tumour xenografts, but does not affect the established vasculature. Toxicity analyses demonstrate that ProAgio is not toxic to mice. Our study reports a new integrin-targeting agent with a unique mechanism of action, and provides a template for the development of integrin-targeting therapeutics.
The HIV-1 envelope glycoprotein (Env) trimer contains the receptor binding sites and membrane fusion machinery that introduce the viral genome into the host cell. As the only target for broadly neutralizing antibodies (bnAbs), Env is a focus for rational vaccine design. We present a cryo-electron microscopy reconstruction and structural model of a cleaved, soluble SOSIP gp140 trimer in complex with a CD4 binding site (CD4bs) bnAb, PGV04, at 5.8 Å resolution. The structure reveals the spatial arrangement of Env components, including the V1/V2, V3, HR1 and HR2 domains, and shielding glycans. The structure also provides insights into trimer assembly, gp120-gp41 interactions, and the CD4bs epitope cluster for bnAbs, which covers a more extensive area and defines a more complex site of vulnerability than previously described.
Interactions of transcription factors (TFs) with DNA comprise a complex interplay between base-specific amino acid contacts and readout of DNA structure. Recent studies have highlighted the complementarity of DNA sequence and shape in modeling TF binding in vitro. Here, we have provided a comprehensive evaluation of in vivo datasets to assess the predictive power obtained by augmenting various DNA sequence-based models of TF binding sites (TFBSs) with DNA shape features (helix twist, minor groove width, propeller twist, and roll). Results from 400 human ChIP-seq datasets for 76 TFs show that combining DNA shape features with position-specific scoring matrix (PSSM) scores improves TFBS predictions. Improvement has also been observed using TF flexible models and a machine-learning approach using a binary encoding of nucleotides in lieu of PSSMs. Incorporating DNA shape information is most beneficial for E2F and MADS-domain TF families. Our findings indicate that incorporating DNA sequence and shape information benefits the modeling of TF binding under complex in vivo conditions.
The possibility to design proteins whose activities can be switched on and off by unrelated effector molecules would enable applications in various research areas, ranging from biosensing to synthetic biology. We describe here a general method to modulate the activity of a protein in response to the concentration of a specific effector. The approach is based on synthetic ligands that possess two mutually exclusive binding sites, one for the protein of interest and one for the effector. Tethering such a ligand to the protein of interest results in an intramolecular ligand-protein interaction that can be disrupted through the presence of the effector. Specifically, we introduce a luciferase controlled by another protein, a human carbonic anhydrase whose activity can be controlled by proteins or small molecules in vitro and on living cells, and novel fluorescent and bioluminescent biosensors.
Although numerous approaches have been developed to map RNA-binding sites of individual RNA-binding proteins (RBPs), few methods exist that allow assessment of global RBP-RNA interactions. Here, we describe PIP-seq, a universal, high-throughput, ribonuclease-mediated protein footprint sequencing approach that reveals RNA-protein interaction sites throughout a transcriptome of interest. We apply PIP-seq to the HeLa transcriptome and compare binding sites found using different cross-linkers and ribonucleases. From this analysis, we identify numerous putative RBP-binding motifs, reveal novel insights into co-binding by RBPs, and uncover a significant enrichment for disease-associated polymorphisms within RBP interaction sites.
Gene regulatory networks are ultimately encoded by the sequence-specific binding of (TFs) to short DNA segments. Although it is customary to represent the binding specificity of a TF by a position-specific weight matrix (PSWM), which assumes each position within a site contributes independently to the overall binding affinity, evidence has been accumulating that there can be significant dependencies between positions. Unfortunately, methodological challenges have so far hindered the development of a practical and generally-accepted extension of the PSWM model. On the one hand, simple models that only consider dependencies between nearest-neighbor positions are easy to use in practice, but fail to account for the distal dependencies that are observed in the data. On the other hand, models that allow for arbitrary dependencies are prone to overfitting, requiring regularization schemes that are difficult to use in practice for non-experts. Here we present a new regulatory motif model, called dinucleotide weight tensor (DWT), that incorporates arbitrary pairwise dependencies between positions in binding sites, rigorously from first principles, and free from tunable parameters. We demonstrate the power of the method on a large set of ChIP-seq data-sets, showing that DWTs outperform both PSWMs and motif models that only incorporate nearest-neighbor dependencies. We also demonstrate that DWTs outperform two previously proposed methods. Finally, we show that DWTs inferred from ChIP-seq data also outperform PSWMs on HT-SELEX data for the same TF, suggesting that DWTs capture inherent biophysical properties of the interactions between the DNA binding domains of TFs and their binding sites. We make a suite of DWT tools available at dwt.unibas.ch, that allow users to automatically perform ‘motif finding’, i.e. the inference of DWT motifs from a set of sequences, binding site prediction with DWTs, and visualization of DWT ‘dilogo’ motifs.
DNA-binding proteins control many fundamental biological processes such as transcription, recombination and replication. A major goal is to decipher the role that DNA sequence plays in orchestrating the binding and activity of such regulatory proteins. To address this goal, it is useful to rationally design DNA sequences with desired numbers, affinities and arrangements of protein binding sites. However, removing binding sites from DNA is computationally non-trivial since one risks creating new sites in the process of deleting or moving others. Here we present an online binding site removal tool, SiteOut, that enables users to design arbitrary DNA sequences that entirely lack binding sites for factors of interest. SiteOut can also be used to delete sites from a specific sequence, or to introduce site-free spacers between functional sequences without creating new sites at the junctions. In combination with commercial DNA synthesis services, SiteOut provides a powerful and flexible platform for synthetic projects that interrogate regulatory DNA. Here we describe the algorithm and illustrate the ways in which SiteOut can be used; it is publicly available at https://depace.med.harvard.edu/siteout/.