Discover the most talked about and latest scientific content & concepts.

Concept: Cdx protein family


The role of protein-lipid interactions is increasingly recognized to be of importance in numerous biological processes. Bioinformatics is being increasingly used as a helpful tool in studying protein-lipid interactions. Especially recently developed approaches recognizing lipid binding regions in proteins can be implemented. In this study one of those bioinformatics approaches specialized in identifying lipid binding helical regions in proteins is expanded. The approach is explored further by features which can be easily obtained manually. Some interesting examples of members of the amphitropic protein family have been investigated in order to demonstrate the additional features of this bioinformatics approach. The results in this study seem to indicate interesting characteristics of amphitropic proteins and provide insight into the mechanistic functioning and overall understanding of this intriguing class of proteins. Additionally, the results demonstrate that the presented bioinformatics approach might be either an interesting starting point in protein-lipid interactions studies or a good tool for selecting new focus points for more detailed experimental research of proteins with known overall protein-lipid binding abilities.

Concepts: DNA, Proteins, Protein, Bioinformatics, Molecular biology, Metabolism, Proteome, Cdx protein family


Protein domains are commonly used to assess the functional roles and evolutionary relationships of proteins and protein families. Here, we use the Pfam protein family database to examine a set of candidate partial domains. Pfam protein domains are often thought of as evolutionarily indivisible, structurally compact, units from which larger functional proteins are assembled; however, almost 4% of Pfam27 PfamA domains are shorter than 50% of their family model length, suggesting that more than half of the domain is missing at those locations. To better understand the structural nature of partial domains in proteins, we examined 30,961 partial domain regions from 136 domain families contained in a representative subset of PfamA domains (RefProtDom2 or RPD2).

Concepts: Protein structure, Bioinformatics, Evolution, Biology, Phylogenetic tree, Protein domain, Cdx protein family, Protein family


The extent of co-sensitization within and between food protein families in an adult population is largely unknown. This study aimed to identify the most frequently recognized components in the PR-10 and storage protein family, as well as patterns in (co-)sensitization, in a birch-endemic area.

Concepts: Proteins, Bioinformatics, Molecular biology, Nutrition, Biology, Allergy, Cdx protein family, Protein family


From yeast to mammals, autophagy is an important mechanism for sustaining cellular homeostasis through facilitating the degradation and recycling of aged and cytotoxic components. During autophagy, cargo is captured in double-membraned vesicles, the autophagosomes, and degraded through lysosomal fusion. In yeast, autophagy initiation, cargo recognition, cargo engulfment, and vesicle closure is Atg8 dependent. In higher eukaryotes, Atg8 has evolved into the LC3/GABARAP protein family consisting of 7 family proteins [LC3A (2 splice variants), LC3B, LC3C, GABARAP, GABARAPL1, and GABARAPL2]. LC3B, the most studied family protein, is associated with autophagosome development and maturation and is used to monitor autophagic activity. Given the high homology, the other LC3/GABARAP family proteins are often presumed to fulfill similar functions. Nevertheless, substantial evidence shows that the LC3/GABARAP family proteins are unique in function and important in autophagy-independent mechanisms. In this review, we discuss the current knowledge and function(s) of the LC3/GABARAP family proteins. We focus on processing of the individual family proteins and their role in autophagy initiation, cargo recognition, vesicle closure, and trafficking, a complex and tightly regulated process that requires selective presentation and recruitment of these family proteins. In addition, functions unrelated to autophagy of the LC3/GABARAP protein family members are discussed.-Schaaf, M. B. E., Keulers, T. G, Vooijs, M. A., Rouschop, K. M. A. LC3/GABARAP family proteins: autophagy-(un)related functions.

Concepts: DNA, Protein, Cell, Bioinformatics, Metabolism, Endoplasmic reticulum, Cdx protein family, Protein family


HAMAP (High-quality Automated and Manual Annotation of Proteins-available at is a system for the classification and annotation of protein sequences. It consists of a collection of manually curated family profiles for protein classification, and associated annotation rules that specify annotations that apply to family members. HAMAP was originally developed to support the manual curation of UniProtKB/Swiss-Prot records describing microbial proteins. Here we describe new developments in HAMAP, including the extension of HAMAP to eukaryotic proteins, the use of HAMAP in the automated annotation of UniProtKB/TrEMBL, providing high-quality annotation for millions of protein sequences, and the future integration of HAMAP into a unified system for UniProtKB annotation, UniRule. HAMAP is continuously updated by expert curators with new family profiles and annotation rules as new protein families are characterized. The collection of HAMAP family classification profiles and annotation rules can be browsed and viewed on the HAMAP website, which also provides an interface to scan user sequences against HAMAP profiles.

Concepts: DNA, Proteins, Protein, Protein structure, Bioinformatics, Curator, Cdx protein family, Protein family


PDZ-containing proteins comprise one of the most widely distributed protein families playing major role in localization and membrane receptor clustering. They are hence important regulators of signal transduction in cellular pathways. Although knowledge on these proteins has increased exponentially, the existing database ‘PDZBase’ is limited by presence of only 339 proteins as it dates back to 2004 when very little data was available. Thus, lack of exclusive information on this protein family led us to develop PDZscape. ‘PDZscape’ encompasses the complete available information on 58,648 PDZ-containing proteins with their known and putative binding partners on one platform. It has a user-friendly web interface that can be easily queried with external protein identifiers. With unique integration of prominent databases including NCBI, UniProtKB, Swiss-Prot, Pubmed, PDB, STRING, IntAct, KEGG, Pfam and Protein Mutant Database, it provides detailed information on PDZ interactome apart from the customized BLAST option. Most importantly, this database encompasses the mutations and diseases associated with PDZ containing proteins manually curated by our group, thus making it a comprehensive compilation. It also features tools to query the database using sequence (PDZ-Blast) and to find if protein of interest is a PDZ-binding protein. PDZscape is freely available at .

Concepts: Proteins, Protein, Bioinformatics, Signal transduction, Cell membrane, Hormone, Receptor, Cdx protein family


Efflux protein plays a key role in pumping xenobiotics out of the cells. The prediction of efflux family proteins involved in transport process of compounds is crucial for understanding family structures, functions and energy dependencies. Many methods have been proposed to classify efflux pump transporters without considerations of any pump-specific of efflux protein families. In other words, efflux proteins protect cells from extrusion of foreign chemicals. Moreover, almost all efflux protein families have the same structure based on the analysis of significant motifs. The motif sequences consisting of the same amount of residues will have high degrees of residue similarity and thus will affect the classification process. Consequently, it is challenging but vital to recognize the structures and determine energy dependencies of efflux protein families. In order to efficiently identify efflux protein families with considering about pump-specific, we developed a two-dimensional Convolutional Neural Network (2D CNN) model called DeepEfflux. DeepEfflux tried to capture the motifs of sequences around hidden target residues to use as hidden features of families. In addition, the 2D CNN model uses a Position-Specific Scoring Matrix (PSSM) as an input. Three different data sets, each for one family of efflux protein was fed into DeepEfflux, and then a five-fold cross validation approach was used to evaluate the training performance.

Concepts: Family, Proteins, Protein, Gene, Metabolism, 2D computer graphics, Xenobiotic metabolism, Cdx protein family


Protein thermostability engineering is a powerful tool to improve resistance of proteins against high temperatures and thereafter broaden their applications. For efficient protein thermostability engineering, different thermostability-classified data sources including sequences and 3D structures are needed for different protein families. However, no data source is available providing such data easily. It is the first release of ProtDataTherm database for analysis and engineering of protein thermostability which contains more than 14 million protein sequences categorized based on their thermal stability and protein family. This database contains data needed for better understanding protein thermostability and stability engineering. Providing categorized protein sequences and structures as psychrophilic, mesophilic and thermophilic makes this database useful for the development of new tools in protein stability prediction. This database is available at As a proof of concept, the thermostability that improves mutations were suggested for one sample protein belonging to one of protein families with more than 20 mesophilic and thermophilic sequences and with known experimentally measured ΔT of mutations available within ProTherm database.

Concepts: DNA, Better, Proteins, Protein, Protein structure, Bioinformatics, Cdx protein family, Mesophile


Pentatricopeptide repeat proteins are one of the major protein families in flowering plants, containing around 450 members. They participate in RNA editing and are related to plant growth, development and reproduction, as well as to responses to ABA and abiotic stresses. Their characteristics have been described in silico; however, relatively little is known about their biochemical properties. Different types of PPR proteins, with different tasks in RNA editing, have been suggested to interact in an editosome to complete RNA editing. Other non-PPR editing factors, such as the multiple organellar RNA editing factors and the organelle RNA recognition motif-containing protein family, for example, have also been described in plants. However, while evidence on protein interactions between non-PPR RNA editing proteins is accumulating, very few PPR protein interactions have been reported; possibly due to their high instability. In this manuscript, we aimed to optimize the conditions for non-denaturing protein extraction of PPR proteins allowing in vivo protein analyses, such as interaction assays by co-immunoprecipitation. The unusually high protein degradation rate, the aggregation properties and the high pI, as well as the ATP-dependence of some PPR proteins, are key aspects to be considered when extracting PPR proteins in a non-denatured state. During extraction of PPR proteins, the use of proteasome and phosphatase inhibitors is critical. The use of the ATP-cofactor reduces considerably the degradation of PPR proteins. A short centrifugation step to discard cell debris is essential to avoid PPR precipitation; while in some cases, addition of a reductant is needed, probably caused by the pI/pH context. This work provides an easy and rapid optimized non-denaturing total protein extraction protocol from plant tissue, suitable for polypeptides of the PPR family.

Concepts: DNA, Proteins, Protein, Cell nucleus, Cell, Amino acid, Ribosome, Cdx protein family


Active molecules among numerous chemical structures in a chemical database can be searched easily by statistical prediction of compound-protein interactions. However, constructing a simple prediction model against one protein does not aid drug design, because detecting chemical structures that act similarly against multiple proteins is necessary for preventing side effects of the potential drug. To tackle this problem, we propose a new method that visualizes chemical and protein spaces. For simultaneous visualization of both spaces, we employ a counterpropagation neural network (CPNN) and develop a new visualization method named multi-input CPNN (MICPNN). In a case study of the kinase protein family, the MICPNN model predicted accurately the complex relationships between compounds and proteins. The proposed method identified chemical structures with promising activity against kinases. Our proposed method is also applicable to other protein families, such as G-protein coupled receptors, ion channels and transporters.

Concepts: Scientific method, Protein, Bioinformatics, Signal transduction, Enzyme, Cell membrane, Receptor, Cdx protein family