Journal: Molecular systems biology
Multi-omics studies promise the improved characterization of biological processes across molecular layers. However, methods for the unsupervised integration of the resulting heterogeneous data sets are lacking. We present Multi-Omics Factor Analysis (MOFA), a computational method for discovering the principal sources of variation in multi-omics data sets. MOFA infers a set of (hidden) factors that capture biological and technical sources of variability. It disentangles axes of heterogeneity that are shared across multiple modalities and those specific to individual data modalities. The learnt factors enable a variety of downstream analyses, including identification of sample subgroups, data imputation and the detection of outlier samples. We applied MOFA to a cohort of 200 patient samples of chronic lymphocytic leukaemia, profiled for somatic mutations, RNA expression, DNA methylation and ex vivo drug responses. MOFA identified major dimensions of disease heterogeneity, including immunoglobulin heavy-chain variable region status, trisomy of chromosome 12 and previously underappreciated drivers, such as response to oxidative stress. In a second application, we used MOFA to analyse single-cell multi-omics data, identifying coordinated transcriptional and epigenetic changes along cell differentiation.
Single-cell RNA-seq has enabled gene expression to be studied at an unprecedented resolution. The promise of this technology is attracting a growing user base for single-cell analysis methods. As more analysis tools are becoming available, it is becoming increasingly difficult to navigate this landscape and produce an up-to-date workflow to analyse one’s data. Here, we detail the steps of a typical single-cell RNA-seq analysis, including pre-processing (quality control, normalization, data correction, feature selection, and dimensionality reduction) and cell- and gene-level downstream analysis. We formulate current best-practice recommendations for these steps based on independent comparison studies. We have integrated these best-practice recommendations into a workflow, which we apply to a public dataset to further illustrate how these steps work in practice. Our documented case study can be found at https://www.github.com/theislab/single-cell-tutorial This review will serve as a workflow tutorial for new entrants into the field, and help established users update their analysis pipelines.
Purely in vitro ribosome synthesis could provide a critical step towards unraveling the systems biology of ribosome biogenesis, constructing minimal cells from defined components, and engineering ribosomes with new functions. Here, as an initial step towards this goal, we report a method for constructing Escherichia coli ribosomes in crude S150 E. coli extracts. While conventional methods for E. coli ribosome reconstitution are non-physiological, our approach attempts to mimic chemical conditions in the cytoplasm, thus permitting several biological processes to occur simultaneously. Specifically, our integrated synthesis, assembly, and translation (iSAT) technology enables one-step co-activation of rRNA transcription, assembly of transcribed rRNA with native ribosomal proteins into functional ribosomes, and synthesis of active protein by these ribosomes in the same compartment. We show that iSAT makes possible the in vitro construction of modified ribosomes by introducing a 23S rRNA mutation that mediates resistance against clindamycin. We anticipate that iSAT will aid studies of ribosome assembly and open new avenues for making ribosomes with altered properties.
The ecological forces that govern the assembly and stability of the human gut microbiota remain unresolved. We developed a generalizable model-guided framework to predict higher-dimensional consortia from time-resolved measurements of lower-order assemblages. This method was employed to decipher microbial interactions in a diverse human gut microbiome synthetic community. We show that pairwise interactions are major drivers of multi-species community dynamics, as opposed to higher-order interactions. The inferred ecological network exhibits a high proportion of negative and frequent positive interactions. Ecological drivers and responsive recipient species were discovered in the network. Our model demonstrated that a prevalent positive and negative interaction topology enables robust coexistence by implementing a negative feedback loop that balances disparities in monospecies fitness levels. We show that negative interactions could generate history-dependent responses of initial species proportions that frequently do not originate from bistability. Measurements of extracellular metabolites illuminated the metabolic capabilities of monospecies and potential molecular basis of microbial interactions. In sum, these methods defined the ecological roles of major human-associated intestinal species and illuminated design principles of microbial communities.
Advances in genome sequencing have progressed at a rapid pace, with increased throughput accompanied by plunging costs. But these advances go far beyond faster and cheaper. High-throughput sequencing technologies are now routinely being applied to a wide range of important topics in biology and medicine, often allowing researchers to address important biological questions that were not possible before. In this review, we discuss these innovative new approaches-including ever finer analyses of transcriptome dynamics, genome structure and genomic variation-and provide an overview of the new insights into complex biological systems catalyzed by these technologies. We also assess the impact of genotyping, genome sequencing and personal omics profiling on medical applications, including diagnosis and disease monitoring. Finally, we review recent developments in single-cell sequencing, and conclude with a discussion of possible future advances and obstacles for sequencing in biology and health.
Circadian (∼24 h) timekeeping is essential for the lives of many organisms. To understand the biochemical mechanisms of this timekeeping, we have developed a detailed mathematical model of the mammalian circadian clock. Our model can accurately predict diverse experimental data including the phenotypes of mutations or knockdown of clock genes as well as the time courses and relative expression of clock transcripts and proteins. Using this model, we show how a universal motif of circadian timekeeping, where repressors tightly bind activators rather than directly binding to DNA, can generate oscillations when activators and repressors are in stoichiometric balance. Furthermore, we find that an additional slow negative feedback loop preserves this stoichiometric balance and maintains timekeeping with a fixed period. The role of this mechanism in generating robust rhythms is validated by analysis of a simple and general model and a previous model of the Drosophila circadian clock. We propose a double-negative feedback loop design for biological clocks whose period needs to be tightly regulated even with large changes in gene dosage.
Numerous transcription factors (TFs) encode information about upstream signals in the dynamics of their activation, but how downstream genes decode these dynamics remains poorly understood. Using microfluidics to control the nucleocytoplasmic translocation dynamics of the budding yeast TF Msn2, we elucidate the principles that govern how different promoters convert dynamical Msn2 input into gene expression output in single cells. Combining modeling and experiments, we classify promoters according to their signal-processing behavior and reveal that multiple, distinct gene expression programs can be encoded in the dynamics of Msn2. We show that both oscillatory TF dynamics and slow promoter kinetics lead to higher noise in gene expression. Furthermore, we show that the promoter activation timescale is related to nucleosome remodeling. Our findings imply a fundamental trade-off: although the cell can exploit different promoter classes to differentially control gene expression using TF dynamics, gene expression noise fundamentally limits how much information can be encoded in the dynamics of a single TF and reliably decoded by promoters.
Human FOXP3(+)CD25(+)CD4(+) regulatory T cells (Tregs) are essential to the maintenance of immune homeostasis. Several genes are known to be important for murine Tregs, but for human Tregs the genes and underlying molecular networks controlling the suppressor function still largely remain unclear. Here, we describe a strategy to identify the key genes directly from an undirected correlation network which we reconstruct from a very high time-resolution (HTR) transcriptome during the activation of human Tregs/CD4(+) T-effector cells. We show that a predicted top-ranked new key gene PLAU (the plasminogen activator urokinase) is important for the suppressor function of both human and murine Tregs. Further analysis unveils that PLAU is particularly important for memory Tregs and that PLAU mediates Treg suppressor function via STAT5 and ERK signaling pathways. Our study demonstrates the potential for identifying novel key genes for complex dynamic biological processes using a network strategy based on HTR data, and reveals a critical role for PLAU in Treg suppressor function.
We performed integrative network analyses to identify targets that can be used for effectively treating liver diseases with minimal side effects. We first generated co-expression networks (CNs) for 46 human tissues and liver cancer to explore the functional relationships between genes and examined the overlap between functional and physical interactions. Since increased de novo lipogenesis is a characteristic of nonalcoholic fatty liver disease (NAFLD) and hepatocellular carcinoma (HCC), we investigated the liver-specific genes co-expressed with fatty acid synthase (FASN). CN analyses predicted that inhibition of these liver-specific genes decreases FASN expression. Experiments in human cancer cell lines, mouse liver samples, and primary human hepatocytes validated our predictions by demonstrating functional relationships between these liver genes, and showing that their inhibition decreases cell growth and liver fat content. In conclusion, we identified liver-specific genes linked to NAFLD pathogenesis, such as pyruvate kinase liver and red blood cell (PKLR), or to HCC pathogenesis, such as PKLR, patatin-like phospholipase domain containing 3 (PNPLA3), and proprotein convertase subtilisin/kexin type 9 (PCSK9), all of which are potential targets for drug development.
Copy number alteration (CNA) profiling of human tumors has revealed recurrent patterns of DNA amplifications and deletions across diverse cancer types. These patterns are suggestive of conserved selection pressures during tumor evolution but cannot be fully explained by known oncogenes and tumor suppressor genes. Using a pan-cancer analysis of CNA data from patient tumors and experimental systems, here we show that principal component analysis-defined CNA signatures are predictive of glycolytic phenotypes, including (18)F-fluorodeoxy-glucose (FDG) avidity of patient tumors, and increased proliferation. The primary CNA signature is enriched for p53 mutations and is associated with glycolysis through coordinate amplification of glycolytic genes and other cancer-linked metabolic enzymes. A pan-cancer and cross-species comparison of CNAs highlighted 26 consistently altered DNA regions, containing 11 enzymes in the glycolysis pathway in addition to known cancer-driving genes. Furthermore, exogenous expression of hexokinase and enolase enzymes in an experimental immortalization system altered the subsequent copy number status of the corresponding endogenous loci, supporting the hypothesis that these metabolic genes act as drivers within the conserved CNA amplification regions. Taken together, these results demonstrate that metabolic stress acts as a selective pressure underlying the recurrent CNAs observed in human tumors, and further cast genomic instability as an enabling event in tumorigenesis and metabolic evolution.