RNA-seq is a powerful tool for the study of alternative splicing and other forms of alternative isoform expression. Understanding the regulation of these processes requires sensitive and specific detection of differential isoform abundance in comparisons between conditions, cell types, or tissues. We present DEXSeq, a statistical method to test for differential exon usage in RNA-seq data. DEXSeq uses generalized linear models and offers reliable control of false discoveries by taking biological variation into account. DEXSeq detects with high sensitivity genes, and in many cases exons, that are subject to differential exon usage. We demonstrate the versatility of DEXSeq by applying it to several data sets. The method facilitates the study of regulation and function of alternative exon usage on a genome-wide scale. An implementation of DEXSeq is available as an R/Bioconductor package.
Decoding post-transcriptional regulatory programs in RNA is a critical step towards the larger goal of developing predictive dynamical models of cellular behaviour. Despite recent efforts, the vast landscape of RNA regulatory elements remains largely uncharacterized. A long-standing obstacle is the contribution of local RNA secondary structure to the definition of interaction partners in a variety of regulatory contexts, including–but not limited to–transcript stability, alternative splicing and localization. There are many documented instances where the presence of a structural regulatory element dictates alternative splicing patterns (for example, human cardiac troponin T) or affects other aspects of RNA biology. Thus, a full characterization of post-transcriptional regulatory programs requires capturing information provided by both local secondary structures and the underlying sequence. Here we present a computational framework based on context-free grammars and mutual information that systematically explores the immense space of small structural elements and reveals motifs that are significantly informative of genome-wide measurements of RNA behaviour. By applying this framework to genome-wide human mRNA stability data, we reveal eight highly significant elements with substantial structural information, for the strongest of which we show a major role in global mRNA regulation. Through biochemistry, mass spectrometry and in vivo binding studies, we identified human HNRPA2B1 (heterogeneous nuclear ribonucleoprotein A2/B1, also known as HNRNPA2B1) as the key regulator that binds this element and stabilizes a large number of its target genes. We created a global post-transcriptional regulatory map based on the identity of the discovered linear and structural cis-regulatory elements, their regulatory interactions and their target pathways. This approach could also be used to reveal the structural elements that modulate other aspects of RNA behaviour.
Breast cancer transcriptome acquires a myriad of regulation changes, and splicing is critical for the cell to “tailor-make” specific functional transcripts. We systematically revealed splicing signatures of the three most common types of breast tumors using RNA sequencing: TNBC, non-TNBC and HER2-positive breast cancer. We discovered subtype specific differentially spliced genes and splice isoforms not previously recognized in human transcriptome. Further, we showed that exon skip and intron retention are predominant splice events in breast cancer. In addition, we found that differential expression of primary transcripts and promoter switching are significantly deregulated in breast cancer compared to normal breast. We validated the presence of novel hybrid isoforms of critical molecules like CDK4, LARP1, ADD3, and PHLPP2. Our study provides the first comprehensive portrait of transcriptional and splicing signatures specific to breast cancer sub-types, as well as previously unknown transcripts that prompt the need for complete annotation of tissue and disease specific transcriptome.
The methyltransferase like 3 (METTL3)-containing methyltransferase complex catalyzes the N6-methyladenosine (m6A) formation, a novel epitranscriptomic marker; however, the nature of this complex remains largely unknown. Here we report two new components of the human m6A methyltransferase complex, Wilms' tumor 1-associating protein (WTAP) and methyltransferase like 14 (METTL14). WTAP interacts with METTL3 and METTL14, and is required for their localization into nuclear speckles enriched with pre-mRNA processing factors and for catalytic activity of the m6A methyltransferase in vivo. The majority of RNAs bound by WTAP and METTL3 in vivo represent mRNAs containing the consensus m6A motif. In the absence of WTAP, the RNA-binding capability of METTL3 is strongly reduced, suggesting that WTAP may function to regulate recruitment of the m6A methyltransferase complex to mRNA targets. Furthermore, transcriptomic analyses in combination with photoactivatable-ribonucleoside-enhanced crosslinking and immunoprecipitation (PAR-CLIP) illustrate that WTAP and METTL3 regulate expression and alternative splicing of genes involved in transcription and RNA processing. Morpholino-mediated knockdown targeting WTAP and/or METTL3 in zebrafish embryos caused tissue differentiation defects and increased apoptosis. These findings provide strong evidence that WTAP may function as a regulatory subunit in the m6A methyltransferase complex and play a critical role in epitranscriptomic regulation of RNA metabolism.Cell Research advance online publication 10 January 2014; doi:10.1038/cr.2014.3.
The presence of introns in gene-coding regions is one of the most mysterious evolutionary inventions in eukaryotic organisms. It has been proposed that, although sequences involved in intron recognition and splicing are mainly located in introns, exonic sequences also contribute to intron splicing. The smallest constitutively spliced exon known so far has 6 nucleotides, and the smallest alternatively spliced exon has 3 nucleotides. Here we report that the Anaphase Promoting Complex subunit 11 (APC11) gene in Arabidopsis thaliana carries a constitutive single-nucleotide exon. In vivo transcription and translation assays performed using APC11-Green Fluorescence Protein (GFP) fusion constructs revealed that intron splicing surrounding the single-nucleotide exon is effective in both Arabidopsis and rice. This discovery warrants attention to genome annotations in the future.
The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.
Short-read high-throughput RNA sequencing, though powerful, is limited in its ability to directly measure exon connectivity in mRNAs that contain multiple alternative exons located farther apart than the maximum read length. Here, we use the Oxford Nanopore MinION sequencer to identify 7,899 ‘full-length’ isoforms expressed from four Drosophila genes, Dscam1, MRP, Mhc, and Rdl. These results demonstrate that nanopore sequencing can be used to deconvolute individual isoforms and that it has the potential to be a powerful method for comprehensive transcriptome characterization.
Alternative splicing (AS) can critically affect gene function and disease, yet mapping splicing variations remains a challenge. Here, we propose a new approach to define and quantify mRNA splicing in units of local splicing variations (LSVs). LSVs capture previously defined types of alternative splicing as well as more complex transcript variations. Building the first genome wide map of LSVs from twelve mouse tissues, we find complex LSVs constitute over 30% of tissue dependent transcript variations and affect specific protein families. We show the prevalence of complex LSVs is conserved in humans and identify hundreds of LSVs that are specific to brain subregions or altered in Alzheimer’s patients. Amongst those are novel isoforms in the Camk2 family and a novel poison exon in Ptbp1, a key splice factor in neurogenesis. We anticipate the approach presented here will advance the ability to relate tissue-specific splice variation to genetic variation, phenotype, and disease.
Circular RNAs (circRNAs) in animals are an enigmatic class of RNA with unknown function. To explore circRNAs systematically, we sequenced and computationally analysed human, mouse and nematode RNA. We detected thousands of well-expressed, stable circRNAs, often showing tissue/developmental-stage-specific expression. Sequence analysis indicated important regulatory functions for circRNAs. We found that a human circRNA, antisense to the cerebellar degeneration-related protein 1 transcript (CDR1as), is densely bound by microRNA (miRNA) effector complexes and harbours 63 conserved binding sites for the ancient miRNA miR-7. Further analyses indicated that CDR1as functions to bind miR-7 in neuronal tissues. Human CDR1as expression in zebrafish impaired midbrain development, similar to knocking down miR-7, suggesting that CDR1as is a miRNA antagonist with a miRNA-binding capacity ten times higher than any other known transcript. Together, our data provide evidence that circRNAs form a large class of post-transcriptional regulators. Numerous circRNAs form by head-to-tail splicing of exons, suggesting previously unrecognized regulatory potential of coding sequences.
In contrast to transcriptional regulation, the function of alternative splicing (AS) in stem cells is poorly understood. In mammals, MBNL proteins negatively regulate an exon program specific of embryonic stem cells; however, little is known about the in vivo significance of this regulation. We studied AS in a powerful in vivo model for stem cell biology, the planarian Schmidtea mediterranea. We discover a conserved AS program comprising hundreds of alternative exons, microexons and introns that is differentially regulated in planarian stem cells, and comprehensively identify its regulators. We show that functional antagonism between CELF and MBNL factors directly controls stem cell-specific AS in planarians, placing the origin of this regulatory mechanism at the base of Bilaterians. Knockdown of CELF or MBNL factors lead to abnormal regenerative capacities by affecting self-renewal and differentiation sets of genes, respectively. These results highlight the importance of AS interactions in stem cell regulation across metazoans.