Here, we present LNCipedia (http://www.lncipedia.org), a novel database for human long non-coding RNA (lncRNA) transcripts and genes. LncRNAs constitute a large and diverse class of non-coding RNA genes. Although several lncRNAs have been functionally annotated, the majority remains to be characterized. Different high-throughput methods to identify new lncRNAs (including RNA sequencing and annotation of chromatin-state maps) have been applied in various studies resulting in multiple unrelated lncRNA data sets. LNCipedia offers 21 488 annotated human lncRNA transcripts obtained from different sources. In addition to basic transcript information and gene structure, several statistics are determined for each entry in the database, such as secondary structure information, protein coding potential and microRNA binding sites. Our analyses suggest that, much like microRNAs, many lncRNAs have a significant secondary structure, in-line with their presumed association with proteins or protein complexes. Available literature on specific lncRNAs is linked, and users or authors can submit articles through a web interface. Protein coding potential is assessed by two different prediction algorithms: Coding Potential Calculator and HMMER. In addition, a novel strategy has been integrated for detecting potentially coding lncRNAs by automatically re-analysing the large body of publicly available mass spectrometry data in the PRIDE database. LNCipedia is publicly available and allows users to query and download lncRNA sequences and structures based on different search criteria. The database may serve as a resource to initiate small- and large-scale lncRNA studies. As an example, the LNCipedia content was used to develop a custom microarray for expression profiling of all available lncRNAs.
DEAD-box proteins are the largest family of nucleic acid helicases, and are crucial to RNA metabolism throughout all domains of life. They contain a conserved ‘helicase core’ of two RecA-like domains (domains (D)1 and D2), which uses ATP to catalyse the unwinding of short RNA duplexes by non-processive, local strand separation. This mode of action differs from that of translocating helicases and allows DEAD-box proteins to remodel large RNAs and RNA-protein complexes without globally disrupting RNA structure. However, the structural basis for this distinctive mode of RNA unwinding remains unclear. Here, structural, biochemical and genetic analyses of the yeast DEAD-box protein Mss116p indicate that the helicase core domains have modular functions that enable a novel mechanism for RNA-duplex recognition and unwinding. By investigating D1 and D2 individually and together, we find that D1 acts as an ATP-binding domain and D2 functions as an RNA-duplex recognition domain. D2 contains a nucleic-acid-binding pocket that is formed by conserved DEAD-box protein sequence motifs and accommodates A-form but not B-form duplexes, providing a basis for RNA substrate specificity. Upon a conformational change in which the two core domains join to form a ‘closed state’ with an ATPase active site, conserved motifs in D1 promote the unwinding of duplex substrates bound to D2 by excluding one RNA strand and bending the other. Our results provide a comprehensive structural model for how DEAD-box proteins recognize and unwind RNA duplexes. This model explains key features of DEAD-box protein function and affords a new perspective on how the evolutionarily related cores of other RNA and DNA helicases diverged to use different mechanisms.
BACKGROUND: RNA sequencing (RNA-Seq) is emerging as a highly accurate method to quantify transcript abundance. However, analyses of the large data sets obtained by sequencing the entire transcriptome of organisms have generally been performed by bioinformatics specialists. Here we provide a step-by-step guide and outline a strategy using currently available statistical tools that results in a conservative list of differentially expressed genes. We also discuss potential sources of error in RNA-Seq analysis that could alter interpretation of global changes in gene expression. FINDINGS: When comparing statistical tools, the negative binomial distribution-based methods, edgeR and DESeq, respectively identified 11,995 and 11,317 differentially expressed genes from an RNA-seq dataset generated from soybean leaf tissue grown in elevated O3. However, the number of genes in common between these two methods was only 10,535, resulting in 2,242 genes determined to be differentially expressed by only one method. Upon analysis of the non-significant genes, several limitations of these analytic tools were revealed, including evidence for overly stringent parameters for determining statistical significance of differentially expressed genes as well as increased type II error for high abundance transcripts. CONCLUSIONS: Because of the high variability between methods for determining differential expression of RNA-Seq data, we suggest using several bioinformatics tools, as outlined here, to ensure that a conservative list of differentially expressed genes is obtained. We also conclude that despite these analytical limitations, RNA-Seq provides highly accurate transcript abundance quantification that is comparable to qRT-PCR.
Sulforaphane (SFN) is a dietary cancer preventive with incompletely characterized mechanism(s) of cancer prevention. Since prostaglandin E2 (PGE2) promotes cancer progression, we hypothesized that SFN may block PGE2 synthesis in cancer cells. We found that SFN indeed blocked PGE2 production in human A549 cancer cells not by inhibiting COX-2, but rather by suppressing the expression of microsomal prostaglandin E synthase (mPGES-1), the enzyme that directly synthesizes PGE2. We identified the Hypoxia Inducible Factor 1 alpha (HIF-1α) as the target of SFN-mediated mPGES-1 suppression. SFN suppressed HIF-1α protein expression and the presence of HIF-1α at the mPGES-1 promoter, resulting in reduced transcription of mPGES-1. Finally, SFN also reduced expression of mPGES-1 and PGE2 production in A549 xenograft tumors in mice. Together, these results point to the HIF-1α, mPGES-1 and PGE2 axis as a potential mediator of the anti-cancer effects of SFN, and illustrate the potential of SFN for therapeutic control of cancer and inflammation. Harmful side effects in patients taking agents that target the more upstream COX-2 enzyme render the downstream target mPGES-1 a significant target for anti-inflammatory therapy. Thus, SFN could prove to be an important therapeutic approach to both cancer and inflammation.
Monomeric anthocyanins and polymeric proanthocyanidins (condensed tannins) contribute to important plant traits such as flower and fruit pigmentation, fruit astringency, disease resistance and forage quality. Recent advances in our understanding of the transcriptional control mechanisms that regulate anthocyanin and condensed tannin formation in plants suggest new approaches for the engineering of quality traits associated with these molecules. In particular, MYB family transcription factors are emerging as central players in the coordinated activation of sets of genes specific for the anthocyanin and tannin pathways. Mutations in these genes underlie potentially valuable crop traits, and ectopic over- or under-expression of MYB transcription factors provides routes for engineering of these complex pathways.
Thymidine kinase 1 (TK1) is a salvage enzyme involved in DNA precursor synthesis, and its expression is proliferation dependent. A serum form of TK1 has been used as a biomarker in human medicine for many years and more recently to monitor canine lymphoma. Canine TK1 has not been cloned and studied. Therefore, dog and human TK1 cDNA were cloned and expressed, and the recombinant enzymes characterized. The serum and cellular forms of canine and human TK1 were studied by size-exclusion chromatography and the level of TK1 protein was determined using polyclonal and monoclonal anti-TK1 antibodies.
Computational analysis of cDNA sequences from multiple organisms suggests that a large portion of transcribed DNA does not code for a functional protein. In mammals, noncoding transcription is abundant, and often results in functional RNA molecules that do not appear to encode proteins. Many long noncoding RNAs (lncRNAs) appear to have epigenetic regulatory function in humans, including HOTAIR and XIST. While epigenetic gene regulation is clearly an essential mechanism in plants, relatively little is known about the presence or function of lncRNAs in plants.
G-quadruplex structures, formed from guanine rich sequences, have previously been shown to be involved in various physiological processes including cancer-related gene expression. Furthermore, G-quadruplexes have been found in several oncogene promoter regions, and have been shown to play a role in the regulation of gene expression. The mutagenic properties of oxidative stress on DNA have been widely studied, as has the association with carcinogenesis. Guanine is the most susceptible nucleotide to oxidation, and as such, G-rich sequences that form G-quadruplexes can be viewed as potential “hot-spots” for DNA oxidation. We propose that oxidation may destabilise the G-quadruplex structure, leading to its unfolding into the duplex structure, affecting gene expression. This would imply a possible mechanism by which oxidation may impact on oncogene expression. This work investigates the effect of oxidation on two biologically relevant G-quadruplex structures through 500 ns molecular dynamics simulations on those found in the promoter regions of the c-Kit and c-Myc oncogenes. The results show oxidation having a detrimental effect on stability of the structure, substantially destabilising the c-Kit quadruplex, and with a more attenuated effect on the c-Myc quadruplex. Results are suggestive of a novel route for oxidation-mediated oncogenesis and may have wider implications for genome stability.
Numerous transcription factors (TFs) encode information about upstream signals in the dynamics of their activation, but how downstream genes decode these dynamics remains poorly understood. Using microfluidics to control the nucleocytoplasmic translocation dynamics of the budding yeast TF Msn2, we elucidate the principles that govern how different promoters convert dynamical Msn2 input into gene expression output in single cells. Combining modeling and experiments, we classify promoters according to their signal-processing behavior and reveal that multiple, distinct gene expression programs can be encoded in the dynamics of Msn2. We show that both oscillatory TF dynamics and slow promoter kinetics lead to higher noise in gene expression. Furthermore, we show that the promoter activation timescale is related to nucleosome remodeling. Our findings imply a fundamental trade-off: although the cell can exploit different promoter classes to differentially control gene expression using TF dynamics, gene expression noise fundamentally limits how much information can be encoded in the dynamics of a single TF and reliably decoded by promoters.
Differential expression analysis based on “next-generation” sequencing technologies is a fundamental means of studying RNA expression. We recently developed a multi-step normalization method (called TbT) for two-group RNA-seq data with replicates and demonstrated that the statistical methods available in four R packages (edgeR, DESeq, baySeq, and NBPSeq) together with TbT can produce a well-ranked gene list in which true differentially expressed genes (DEGs) are top-ranked and non-DEGs are bottom ranked. However, the advantages of the current TbT method come at the cost of a huge computation time. Moreover, the R packages did not have normalization methods based on such a multi-step strategy.