Concept: Cdx protein family
The role of protein-lipid interactions is increasingly recognized to be of importance in numerous biological processes. Bioinformatics is being increasingly used as a helpful tool in studying protein-lipid interactions. Especially recently developed approaches recognizing lipid binding regions in proteins can be implemented. In this study one of those bioinformatics approaches specialized in identifying lipid binding helical regions in proteins is expanded. The approach is explored further by features which can be easily obtained manually. Some interesting examples of members of the amphitropic protein family have been investigated in order to demonstrate the additional features of this bioinformatics approach. The results in this study seem to indicate interesting characteristics of amphitropic proteins and provide insight into the mechanistic functioning and overall understanding of this intriguing class of proteins. Additionally, the results demonstrate that the presented bioinformatics approach might be either an interesting starting point in protein-lipid interactions studies or a good tool for selecting new focus points for more detailed experimental research of proteins with known overall protein-lipid binding abilities.
Protein domains are commonly used to assess the functional roles and evolutionary relationships of proteins and protein families. Here, we use the Pfam protein family database to examine a set of candidate partial domains. Pfam protein domains are often thought of as evolutionarily indivisible, structurally compact, units from which larger functional proteins are assembled; however, almost 4% of Pfam27 PfamA domains are shorter than 50% of their family model length, suggesting that more than half of the domain is missing at those locations. To better understand the structural nature of partial domains in proteins, we examined 30,961 partial domain regions from 136 domain families contained in a representative subset of PfamA domains (RefProtDom2 or RPD2).
- FASEB journal : official publication of the Federation of American Societies for Experimental Biology
- Published about 1 year ago
From yeast to mammals, autophagy is an important mechanism for sustaining cellular homeostasis through facilitating the degradation and recycling of aged and cytotoxic components. During autophagy, cargo is captured in double-membraned vesicles, the autophagosomes, and degraded through lysosomal fusion. In yeast, autophagy initiation, cargo recognition, cargo engulfment, and vesicle closure is Atg8 dependent. In higher eukaryotes, Atg8 has evolved into the LC3/GABARAP protein family consisting of 7 family proteins [LC3A (2 splice variants), LC3B, LC3C, GABARAP, GABARAPL1, and GABARAPL2]. LC3B, the most studied family protein, is associated with autophagosome development and maturation and is used to monitor autophagic activity. Given the high homology, the other LC3/GABARAP family proteins are often presumed to fulfill similar functions. Nevertheless, substantial evidence shows that the LC3/GABARAP family proteins are unique in function and important in autophagy-independent mechanisms. In this review, we discuss the current knowledge and function(s) of the LC3/GABARAP family proteins. We focus on processing of the individual family proteins and their role in autophagy initiation, cargo recognition, vesicle closure, and trafficking, a complex and tightly regulated process that requires selective presentation and recruitment of these family proteins. In addition, functions unrelated to autophagy of the LC3/GABARAP protein family members are discussed.-Schaaf, M. B. E., Keulers, T. G, Vooijs, M. A., Rouschop, K. M. A. LC3/GABARAP family proteins: autophagy-(un)related functions.
HAMAP (High-quality Automated and Manual Annotation of Proteins-available at http://hamap.expasy.org/) is a system for the classification and annotation of protein sequences. It consists of a collection of manually curated family profiles for protein classification, and associated annotation rules that specify annotations that apply to family members. HAMAP was originally developed to support the manual curation of UniProtKB/Swiss-Prot records describing microbial proteins. Here we describe new developments in HAMAP, including the extension of HAMAP to eukaryotic proteins, the use of HAMAP in the automated annotation of UniProtKB/TrEMBL, providing high-quality annotation for millions of protein sequences, and the future integration of HAMAP into a unified system for UniProtKB annotation, UniRule. HAMAP is continuously updated by expert curators with new family profiles and annotation rules as new protein families are characterized. The collection of HAMAP family classification profiles and annotation rules can be browsed and viewed on the HAMAP website, which also provides an interface to scan user sequences against HAMAP profiles.
Active molecules among numerous chemical structures in a chemical database can be searched easily by statistical prediction of compound-protein interactions. However, constructing a simple prediction model against one protein does not aid drug design, because detecting chemical structures that act similarly against multiple proteins is necessary for preventing side effects of the potential drug. To tackle this problem, we propose a new method that visualizes chemical and protein spaces. For simultaneous visualization of both spaces, we employ a counterpropagation neural network (CPNN) and develop a new visualization method named multi-input CPNN (MICPNN). In a case study of the kinase protein family, the MICPNN model predicted accurately the complex relationships between compounds and proteins. The proposed method identified chemical structures with promising activity against kinases. Our proposed method is also applicable to other protein families, such as G-protein coupled receptors, ion channels and transporters.
- Proceedings of the National Academy of Sciences of the United States of America
- Published 5 months ago
The retinoblastoma protein (Rb) and the homologous pocket proteins p107 and p130 negatively regulate cell proliferation by binding and inhibiting members of the E2F transcription factor family. The structural features that distinguish Rb from other pocket proteins have been unclear but are critical for understanding their functional diversity and determining why Rb has unique tumor suppressor activities. We describe here important differences in how the Rb and p107 C-terminal domains (CTDs) associate with the coiled-coil and marked-box domains (CMs) of E2Fs. We find that although CTD-CM binding is conserved across protein families, Rb and p107 CTDs show clear preferences for different E2Fs. A crystal structure of the p107 CTD bound to E2F5 and its dimer partner DP1 reveals the molecular basis for pocket protein-E2F binding specificity and how cyclin-dependent kinases differentially regulate pocket proteins through CTD phosphorylation. Our structural and biochemical data together with phylogenetic analyses of Rb and E2F proteins support the conclusion that Rb evolved specific structural motifs that confer its unique capacity to bind with high affinity those E2Fs that are the most potent activators of the cell cycle.
Nucleotide-binding domain and leucine-rich repeat domain-containing (NLR) proteins are sentinels of plant immunity that monitor host proteins for perturbations induced by pathogenic effector proteins. Here we show that the Arabidopsis ZAR1 NLR protein requires the ZRK3 kinase to recognize the Pseudomonas syringae type III effector (T3E) HopF2a. These results support the hypothesis that ZAR1 associates with an expanded ZRK protein family to broaden its effector recognition spectrum.
In functionally diverse protein families, conservation in short signature regions may outperform full-length sequence comparisons for identifying proteins that belong to a subgroup within which one specific aspect of their function is conserved. The SIMBAL workflow (Sites Inferred by Metabolic Background Assertion Labeling) is a data-mining procedure for finding such signature regions. It begins by using clues from genomic context, such as co-occurrence or conserved gene neighborhoods, to build a useful training set from a large number of uncharacterized but mutually homologous proteins. When training set construction is successful, the YES partition is enriched in proteins that share function with the user’s query sequence, while the NO partition is depleted. A selected query sequence is then mined for short signature regions whose closest matches overwhelmingly favor proteins from the YES partition. High-scoring signature regions typically contain key residues critical to functional specificity, so proteins with the highest sequence similarity across these regions tend to share the same function. The SIMBAL algorithm was described previously, but significant manual effort, expertise, and a supporting software infrastructure were required to prepare the requisite training sets. Here, we describe a new, distributable software suite that speeds up and simplifies the process for using SIMBAL, most notably by providing tools that automate training set construction. These tools have broad utility for comparative genomics, allowing for flexible collection of proteins or protein domains based on genomic context as well as homology, a capability that can greatly assist in protein family construction. Armed with this new software suite, SIMBAL can serve as a fast and powerful in silico alternative to direct experimentation for characterizing proteins and their functional interactions.
Solving the structure of Pla l 1 elucidated the preserved fold of Ole e 1-like proteins while IgE cross-reactivity in this family is limited to molecules with high sequence identity. Diagnostic accuracy using source-specific Ole e 1-like molecules is essential for discriminating plantain from other pollen allergies.
NCBI’s Conserved Domain Database (CDD) aims at annotating biomolecular sequences with the location of evolutionarily conserved protein domain footprints, and functional sites inferred from such footprints. An archive of pre-computed domain annotation is maintained for proteins tracked by NCBI’s Entrez database, and live search services are offered as well. CDD curation staff supplements a comprehensive collection of protein domain and protein family models, which have been imported from external providers, with representations of selected domain families that are curated in-house and organized into hierarchical classifications of functionally distinct families and sub-families. CDD also supports comparative analyses of protein families via conserved domain architectures, and a recent curation effort focuses on providing functional characterizations of distinct subfamily architectures using SPARCLE: Subfamily Protein Architecture Labeling Engine. CDD can be accessed at https://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml.