Concept: RNA splicing
Carefully designed control experiments provide a gold standard for benchmarking different genomics research tools. A shortcoming of many gene expression control studies is that replication involves profiling the same reference RNA sample multiple times. This leads to low, pure technical noise that is atypical of regular studies. To achieve a more realistic noise structure, we generated a RNA-sequencing mixture experiment using two cell lines of the same cancer type. Variability was added by extracting RNA from independent cell cultures and degrading particular samples. The systematic gene expression changes induced by this design allowed benchmarking of different library preparation kits (standard poly-A versus total RNA with Ribozero depletion) and analysis pipelines. Data generated using the total RNA kit had more signal for introns and various RNA classes (ncRNA, snRNA, snoRNA) and less variability after degradation. For differential expression analysis, voom with quality weights marginally outperformed other popular methods, while for differential splicing, DEXSeq was simultaneously the most sensitive and the most inconsistent method. For sample deconvolution analysis, DeMix outperformed IsoPure convincingly. Our RNA-sequencing data set provides a valuable resource for benchmarking different protocols and data pre-processing workflows. The extra noise mimics routine lab experiments more closely, ensuring any conclusions are widely applicable.
Eukaryotes have two types of spliceosomes, comprised of either major (U1, U2, U4, U5, U6) or minor (U11, U12, U4atac, U6atac; <1%) snRNPs. The high conservation of minor introns, typically one amidst many major introns in several hundred genes, despite their poor splicing, has been a long-standing enigma. Here, we discovered that the low abundance minor spliceosome's catalytic snRNP, U6atac, is strikingly unstable (t½<2 hr). We show that U6atac level depends on both RNA polymerases II and III and can be rapidly increased by cell stress-activated kinase p38MAPK, which stabilizes it, enhancing mRNA expression of hundreds of minor intron-containing genes that are otherwise suppressed by limiting U6atac. Furthermore, p38MAPK-dependent U6atac modulation can control minor intron-containing tumor suppressor PTEN expression and cytokine production. We propose that minor introns are embedded molecular switches regulated by U6atac abundance, providing a novel post-transcriptional gene expression mechanism and a rationale for the minor spliceosome's evolutionary conservation. DOI:http://dx.doi.org/10.7554/eLife.00780.001.
RNA-seq is a powerful tool for the study of alternative splicing and other forms of alternative isoform expression. Understanding the regulation of these processes requires sensitive and specific detection of differential isoform abundance in comparisons between conditions, cell types, or tissues. We present DEXSeq, a statistical method to test for differential exon usage in RNA-seq data. DEXSeq uses generalized linear models and offers reliable control of false discoveries by taking biological variation into account. DEXSeq detects with high sensitivity genes, and in many cases exons, that are subject to differential exon usage. We demonstrate the versatility of DEXSeq by applying it to several data sets. The method facilitates the study of regulation and function of alternative exon usage on a genome-wide scale. An implementation of DEXSeq is available as an R/Bioconductor package.
Decoding post-transcriptional regulatory programs in RNA is a critical step towards the larger goal of developing predictive dynamical models of cellular behaviour. Despite recent efforts, the vast landscape of RNA regulatory elements remains largely uncharacterized. A long-standing obstacle is the contribution of local RNA secondary structure to the definition of interaction partners in a variety of regulatory contexts, including–but not limited to–transcript stability, alternative splicing and localization. There are many documented instances where the presence of a structural regulatory element dictates alternative splicing patterns (for example, human cardiac troponin T) or affects other aspects of RNA biology. Thus, a full characterization of post-transcriptional regulatory programs requires capturing information provided by both local secondary structures and the underlying sequence. Here we present a computational framework based on context-free grammars and mutual information that systematically explores the immense space of small structural elements and reveals motifs that are significantly informative of genome-wide measurements of RNA behaviour. By applying this framework to genome-wide human mRNA stability data, we reveal eight highly significant elements with substantial structural information, for the strongest of which we show a major role in global mRNA regulation. Through biochemistry, mass spectrometry and in vivo binding studies, we identified human HNRPA2B1 (heterogeneous nuclear ribonucleoprotein A2/B1, also known as HNRNPA2B1) as the key regulator that binds this element and stabilizes a large number of its target genes. We created a global post-transcriptional regulatory map based on the identity of the discovered linear and structural cis-regulatory elements, their regulatory interactions and their target pathways. This approach could also be used to reveal the structural elements that modulate other aspects of RNA behaviour.
Centenarians exhibit extreme longevity and a remarkable compression of morbidity. They have a unique capacity to maintain homeostatic mechanisms. Since small non-coding RNAs (including microRNAs) are implicated in the regulation of gene expression, we hypothesised that longevity of centenarians may reflect alterations in small non-coding RNA expression. We report the first comparison of microRNAs expression profiles in mononuclear cells from centenarians, octogenarians and young individuals resident near Valencia, Spain. Principal Component Analysis of the expression of 15,644 mature microRNAs and, 2,334 snoRNAs and scaRNAs in centenarians revealed a significant overlap with profiles in young individuals but not with octogenarians and a significant up-regulation of 7 small non-coding RNAs in centenarians compared to young persons and notably 102 small non-coding RNAs when compared with octogenarians. We suggest that the small non-coding RNAs signature in centenarians may provide insights into the underlying molecular mechanisms endowing centenarians with extreme longevity.
Breast cancer transcriptome acquires a myriad of regulation changes, and splicing is critical for the cell to “tailor-make” specific functional transcripts. We systematically revealed splicing signatures of the three most common types of breast tumors using RNA sequencing: TNBC, non-TNBC and HER2-positive breast cancer. We discovered subtype specific differentially spliced genes and splice isoforms not previously recognized in human transcriptome. Further, we showed that exon skip and intron retention are predominant splice events in breast cancer. In addition, we found that differential expression of primary transcripts and promoter switching are significantly deregulated in breast cancer compared to normal breast. We validated the presence of novel hybrid isoforms of critical molecules like CDK4, LARP1, ADD3, and PHLPP2. Our study provides the first comprehensive portrait of transcriptional and splicing signatures specific to breast cancer sub-types, as well as previously unknown transcripts that prompt the need for complete annotation of tissue and disease specific transcriptome.
RNA-Seq technology has been used widely in transcriptome study, and one of the most important applications is to estimate the expression level of genes and their alternative splicing isoforms. There have been several algorithms published to estimate the expression based on different models. Recently Wu et al. published a method that can accurately estimate isoform level expression by considering position-related sequencing biases using nonparametric models. The method has advantages in handling different read distributions, but there hasn’t been an efficient program to implement this algorithm.
Aggregation of TAR DNA-binding protein 43 (TDP-43) is a pathological signature of amyotrophic lateral sclerosis (ALS). Although accumulating evidence suggests the involvement of RNA recognition motifs (RRMs) in TDP-43 proteinopathy, it remains unclear how native TDP-43 is converted to pathogenic forms. To elucidate the role of homeostasis of RRM1 structure in ALS pathogenesis, conformations of RRM1 under high pressure were monitored by NMR. We first found that RRM1 was prone to aggregation and had three regions showing stable chemical shifts during misfolding. Moreover, mass-spectrometric analysis of aggregated RRM1 revealed that one of the regions was located on protease-resistant β-strands containing two cysteines (C173 and C175), indicating that this region served as a core assembly interface in RRM1 aggregation. Although a fraction of RRM1 aggregates comprised disulfide-bonded oligomers, the substitution of cysteine(s) to serine(s) (C/S) resulted in unexpected acceleration of amyloid fibrils of RRM1 and disulfide-independent aggregate formation of full-length TDP-43. Notably, TDP-43 aggregates with RRM1-C/S required C-terminus, and replicated cytopathologies of ALS, including mislocalization, impaired RNA splicing, ubiquitination, phosphorylation, and motor neuron toxicity. Furthermore, RRM1-C/S accentuated inclusions of familial ALS-linked TDP-43 mutants in C-terminus. The relevance of RRM1-C/S-induced TDP-43 aggregates in ALS pathogenesis was verified by immunolabeling of inclusions of ALS patients and cultured cells overexpressing the RRM1-C/S TDP-43 with antibody targeting a misfolding-relevant regions. Our results indicate that cysteines in RRM1 crucially govern the conformation of TDP-43, and aberrant self-assembly of RRM1 at amyloidogenic regions contributes to pathogenic conversion of TDP-43 in ALS.
Although the chemopreventive effects of aspirin have been extensively investigated, the roles of many cell components, such as long non-coding RNAs, in these effects are still not completely understood.
Genome editing with CRISPR/Cas9 is a promising new approach for correcting or mitigating disease-causing mutations. Duchenne muscular dystrophy (DMD) is associated with lethal degeneration of cardiac and skeletal muscle caused by more than 3000 different mutations in the X-linked dystrophin gene (DMD). Most of these mutations are clustered in “hotspots.” There is a fortuitous correspondence between the eukaryotic splice acceptor and splice donor sequences and the protospacer adjacent motif sequences that govern prokaryotic CRISPR/Cas9 target gene recognition and cleavage. Taking advantage of this correspondence, we screened for optimal guide RNAs capable of introducing insertion/deletion (indel) mutations by nonhomologous end joining that abolish conserved RNA splice sites in 12 exons that potentially allow skipping of the most common mutant or out-of-frame DMD exons within or nearby mutational hotspots. We refer to the correction of DMD mutations by exon skipping as myoediting. In proof-of-concept studies, we performed myoediting in representative induced pluripotent stem cells from multiple patients with large deletions, point mutations, or duplications within the DMD gene and efficiently restored dystrophin protein expression in derivative cardiomyocytes. In three-dimensional engineered heart muscle (EHM), myoediting of DMD mutations restored dystrophin expression and the corresponding mechanical force of contraction. Correcting only a subset of cardiomyocytes (30 to 50%) was sufficient to rescue the mutant EHM phenotype to near-normal control levels. We conclude that abolishing conserved RNA splicing acceptor/donor sites and directing the splicing machinery to skip mutant or out-of-frame exons through myoediting allow correction of the cardiac abnormalities associated with DMD by eliminating the underlying genetic basis of the disease.