RNA-seq is a powerful tool for the study of alternative splicing and other forms of alternative isoform expression. Understanding the regulation of these processes requires sensitive and specific detection of differential isoform abundance in comparisons between conditions, cell types, or tissues. We present DEXSeq, a statistical method to test for differential exon usage in RNA-seq data. DEXSeq uses generalized linear models and offers reliable control of false discoveries by taking biological variation into account. DEXSeq detects with high sensitivity genes, and in many cases exons, that are subject to differential exon usage. We demonstrate the versatility of DEXSeq by applying it to several data sets. The method facilitates the study of regulation and function of alternative exon usage on a genome-wide scale. An implementation of DEXSeq is available as an R/Bioconductor package.
Decoding post-transcriptional regulatory programs in RNA is a critical step towards the larger goal of developing predictive dynamical models of cellular behaviour. Despite recent efforts, the vast landscape of RNA regulatory elements remains largely uncharacterized. A long-standing obstacle is the contribution of local RNA secondary structure to the definition of interaction partners in a variety of regulatory contexts, including–but not limited to–transcript stability, alternative splicing and localization. There are many documented instances where the presence of a structural regulatory element dictates alternative splicing patterns (for example, human cardiac troponin T) or affects other aspects of RNA biology. Thus, a full characterization of post-transcriptional regulatory programs requires capturing information provided by both local secondary structures and the underlying sequence. Here we present a computational framework based on context-free grammars and mutual information that systematically explores the immense space of small structural elements and reveals motifs that are significantly informative of genome-wide measurements of RNA behaviour. By applying this framework to genome-wide human mRNA stability data, we reveal eight highly significant elements with substantial structural information, for the strongest of which we show a major role in global mRNA regulation. Through biochemistry, mass spectrometry and in vivo binding studies, we identified human HNRPA2B1 (heterogeneous nuclear ribonucleoprotein A2/B1, also known as HNRNPA2B1) as the key regulator that binds this element and stabilizes a large number of its target genes. We created a global post-transcriptional regulatory map based on the identity of the discovered linear and structural cis-regulatory elements, their regulatory interactions and their target pathways. This approach could also be used to reveal the structural elements that modulate other aspects of RNA behaviour.
Breast cancer transcriptome acquires a myriad of regulation changes, and splicing is critical for the cell to “tailor-make” specific functional transcripts. We systematically revealed splicing signatures of the three most common types of breast tumors using RNA sequencing: TNBC, non-TNBC and HER2-positive breast cancer. We discovered subtype specific differentially spliced genes and splice isoforms not previously recognized in human transcriptome. Further, we showed that exon skip and intron retention are predominant splice events in breast cancer. In addition, we found that differential expression of primary transcripts and promoter switching are significantly deregulated in breast cancer compared to normal breast. We validated the presence of novel hybrid isoforms of critical molecules like CDK4, LARP1, ADD3, and PHLPP2. Our study provides the first comprehensive portrait of transcriptional and splicing signatures specific to breast cancer sub-types, as well as previously unknown transcripts that prompt the need for complete annotation of tissue and disease specific transcriptome.
The methyltransferase like 3 (METTL3)-containing methyltransferase complex catalyzes the N6-methyladenosine (m6A) formation, a novel epitranscriptomic marker; however, the nature of this complex remains largely unknown. Here we report two new components of the human m6A methyltransferase complex, Wilms' tumor 1-associating protein (WTAP) and methyltransferase like 14 (METTL14). WTAP interacts with METTL3 and METTL14, and is required for their localization into nuclear speckles enriched with pre-mRNA processing factors and for catalytic activity of the m6A methyltransferase in vivo. The majority of RNAs bound by WTAP and METTL3 in vivo represent mRNAs containing the consensus m6A motif. In the absence of WTAP, the RNA-binding capability of METTL3 is strongly reduced, suggesting that WTAP may function to regulate recruitment of the m6A methyltransferase complex to mRNA targets. Furthermore, transcriptomic analyses in combination with photoactivatable-ribonucleoside-enhanced crosslinking and immunoprecipitation (PAR-CLIP) illustrate that WTAP and METTL3 regulate expression and alternative splicing of genes involved in transcription and RNA processing. Morpholino-mediated knockdown targeting WTAP and/or METTL3 in zebrafish embryos caused tissue differentiation defects and increased apoptosis. These findings provide strong evidence that WTAP may function as a regulatory subunit in the m6A methyltransferase complex and play a critical role in epitranscriptomic regulation of RNA metabolism.Cell Research advance online publication 10 January 2014; doi:10.1038/cr.2014.3.
Genome editing with CRISPR/Cas9 is a promising new approach for correcting or mitigating disease-causing mutations. Duchenne muscular dystrophy (DMD) is associated with lethal degeneration of cardiac and skeletal muscle caused by more than 3000 different mutations in the X-linked dystrophin gene (DMD). Most of these mutations are clustered in “hotspots.” There is a fortuitous correspondence between the eukaryotic splice acceptor and splice donor sequences and the protospacer adjacent motif sequences that govern prokaryotic CRISPR/Cas9 target gene recognition and cleavage. Taking advantage of this correspondence, we screened for optimal guide RNAs capable of introducing insertion/deletion (indel) mutations by nonhomologous end joining that abolish conserved RNA splice sites in 12 exons that potentially allow skipping of the most common mutant or out-of-frame DMD exons within or nearby mutational hotspots. We refer to the correction of DMD mutations by exon skipping as myoediting. In proof-of-concept studies, we performed myoediting in representative induced pluripotent stem cells from multiple patients with large deletions, point mutations, or duplications within the DMD gene and efficiently restored dystrophin protein expression in derivative cardiomyocytes. In three-dimensional engineered heart muscle (EHM), myoediting of DMD mutations restored dystrophin expression and the corresponding mechanical force of contraction. Correcting only a subset of cardiomyocytes (30 to 50%) was sufficient to rescue the mutant EHM phenotype to near-normal control levels. We conclude that abolishing conserved RNA splicing acceptor/donor sites and directing the splicing machinery to skip mutant or out-of-frame exons through myoediting allow correction of the cardiac abnormalities associated with DMD by eliminating the underlying genetic basis of the disease.
The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.
The presence of introns in gene-coding regions is one of the most mysterious evolutionary inventions in eukaryotic organisms. It has been proposed that, although sequences involved in intron recognition and splicing are mainly located in introns, exonic sequences also contribute to intron splicing. The smallest constitutively spliced exon known so far has 6 nucleotides, and the smallest alternatively spliced exon has 3 nucleotides. Here we report that the Anaphase Promoting Complex subunit 11 (APC11) gene in Arabidopsis thaliana carries a constitutive single-nucleotide exon. In vivo transcription and translation assays performed using APC11-Green Fluorescence Protein (GFP) fusion constructs revealed that intron splicing surrounding the single-nucleotide exon is effective in both Arabidopsis and rice. This discovery warrants attention to genome annotations in the future.
Total RNA sequencing has been used to reveal poly(A) and non-poly(A) RNA expression, RNA processing and enhancer activity. To date, no method for full-length total RNA sequencing of single cells has been developed despite the potential of this technology for single-cell biology. Here we describe random displacement amplification sequencing (RamDA-seq), the first full-length total RNA-sequencing method for single cells. Compared with other methods, RamDA-seq shows high sensitivity to non-poly(A) RNA and near-complete full-length transcript coverage. Using RamDA-seq with differentiation time course samples of mouse embryonic stem cells, we reveal hundreds of dynamically regulated non-poly(A) transcripts, including histone transcripts and long noncoding RNA Neat1. Moreover, RamDA-seq profiles recursive splicing in >300-kb introns. RamDA-seq also detects enhancer RNAs and their cell type-specific activity in single cells. Taken together, we demonstrate that RamDA-seq could help investigate the dynamics of gene expression, RNA-processing events and transcriptional regulation in single cells.
Short-read high-throughput RNA sequencing, though powerful, is limited in its ability to directly measure exon connectivity in mRNAs that contain multiple alternative exons located farther apart than the maximum read length. Here, we use the Oxford Nanopore MinION sequencer to identify 7,899 ‘full-length’ isoforms expressed from four Drosophila genes, Dscam1, MRP, Mhc, and Rdl. These results demonstrate that nanopore sequencing can be used to deconvolute individual isoforms and that it has the potential to be a powerful method for comprehensive transcriptome characterization.
Alternative splicing (AS) can critically affect gene function and disease, yet mapping splicing variations remains a challenge. Here, we propose a new approach to define and quantify mRNA splicing in units of local splicing variations (LSVs). LSVs capture previously defined types of alternative splicing as well as more complex transcript variations. Building the first genome wide map of LSVs from twelve mouse tissues, we find complex LSVs constitute over 30% of tissue dependent transcript variations and affect specific protein families. We show the prevalence of complex LSVs is conserved in humans and identify hundreds of LSVs that are specific to brain subregions or altered in Alzheimer’s patients. Amongst those are novel isoforms in the Camk2 family and a novel poison exon in Ptbp1, a key splice factor in neurogenesis. We anticipate the approach presented here will advance the ability to relate tissue-specific splice variation to genetic variation, phenotype, and disease.