Concept: DNA microarray
Background. Attribution to the original contributor upon reuse of published data is important both as a reward for data creators and to document the provenance of research findings. Previous studies have found that papers with publicly available datasets receive a higher number of citations than similar studies without available data. However, few previous analyses have had the statistical power to control for the many variables known to predict citation rate, which has led to uncertain estimates of the “citation benefit”. Furthermore, little is known about patterns in data reuse over time and across datasets. Method and Results. Here, we look at citation rates while controlling for many known citation predictors and investigate the variability of data reuse. In a multivariate regression on 10,555 studies that created gene expression microarray data, we found that studies that made data available in a public repository received 9% (95% confidence interval: 5% to 13%) more citations than similar studies for which the data was not made available. Date of publication, journal impact factor, open access status, number of authors, first and last author publication history, corresponding author country, institution citation history, and study topic were included as covariates. The citation benefit varied with date of dataset deposition: a citation benefit was most clear for papers published in 2004 and 2005, at about 30%. Authors published most papers using their own datasets within two years of their first publication on the dataset, whereas data reuse papers published by third-party investigators continued to accumulate for at least six years. To study patterns of data reuse directly, we compiled 9,724 instances of third party data reuse via mention of GEO or ArrayExpress accession numbers in the full text of papers. The level of third-party data use was high: for 100 datasets deposited in year 0, we estimated that 40 papers in PubMed reused a dataset by year 2, 100 by year 4, and more than 150 data reuse papers had been published by year 5. Data reuse was distributed across a broad base of datasets: a very conservative estimate found that 20% of the datasets deposited between 2003 and 2007 had been reused at least once by third parties. Conclusion. After accounting for other factors affecting citation rate, we find a robust citation benefit from open data, although a smaller one than previously reported. We conclude there is a direct effect of third-party data reuse that persists for years beyond the time when researchers have published most of the papers reusing their own data. Other factors that may also contribute to the citation benefit are considered. We further conclude that, at least for gene expression microarray data, a substantial fraction of archived datasets are reused, and that the intensity of dataset reuse has been steadily increasing since 2003.
Idiopathic chronic diarrhea (ICD) is a leading cause of morbidity amongst rhesus monkeys kept in captivity. Here, we show that exposure of affected animals to the whipworm Trichuris trichiura led to clinical improvement in fecal consistency, accompanied by weight gain, in four out of the five treated monkeys. By flow cytometry analysis of pinch biopsies collected during colonoscopies before and after treatment, we found an induction of a mucosal T(H)2 response following helminth treatment that was associated with a decrease in activated CD4(+) Ki67+ cells. In parallel, expression profiling with oligonucleotide microarrays and real-time PCR analysis revealed reductions in T(H)1-type inflammatory gene expression and increased expression of genes associated with IgE signaling, mast cell activation, eosinophil recruitment, alternative activation of macrophages, and worm expulsion. By quantifying bacterial 16S rRNA in pinch biopsies using real-time PCR analysis, we found reduced bacterial attachment to the intestinal mucosa post-treatment. Finally, deep sequencing of bacterial 16S rRNA revealed changes to the composition of microbial communities attached to the intestinal mucosa following helminth treatment. Thus, the genus Streptophyta of the phylum Cyanobacteria was vastly increased in abundance in three out of five ICD monkeys relative to healthy controls, but was reduced to control levels post-treatment; by contrast, the phylum Tenericutes was expanded post-treatment. These findings suggest that helminth treatment in primates can ameliorate colitis by restoring mucosal barrier functions and reducing overall bacterial attachment, and also by altering the communities of attached bacteria. These results also define ICD in monkeys as a tractable preclinical model for ulcerative colitis in which these effects can be further investigated.
Technology has contributed to the advances on the genomic, transcriptomic, metabolomic and proteomic analyses of the plant-root-knot nematode (RKN) interaction. Holistic approaches to obtain expression profiles, such as cDNA libraries, differential display, q-PCR, microarray hybridization, massive sequencing, etc., have increased our knowledge on the molecular aspects of the interaction and have triggered the development of biotechnological tools to control this plague. An important limitation, however, has been the difficulty of cross-comparative analysis of these data. The construction of a database, NEMATIC, compiling microarray data available in Arabidopsis of the interaction with plant endoparasitic nematodes facilitated the in silico analysis, but is not sufficient for the handling of ‘omic’ information of different plant species. Omics combined with cell isolation techniques have shed some light on the heterogeneous expression signatures of nematode induced gall tissues, i.e., plant defences are specifically inhibited in giant cells within the gall aiding the nematode for a successful establishment. The natural resistance against RKNs varies from an early hypersensitive reaction before the establishment of the nematode, to the arrest of gall growth. The molecular bases of these mechanisms, not fully understood yet, could disclose powerful targets for the development of biotechnology based tools for nematode control.
Stromal-derived intratumoural heterogeneity (ITH) has been shown to undermine molecular stratification of patients into appropriate prognostic/predictive subgroups. Here, using several clinically relevant colorectal cancer (CRC) gene expression signatures, we assessed the susceptibility of these signatures to the confounding effects of ITH using gene expression microarray data obtained from multiple tumour regions of a cohort of 24 patients, including central tumour, the tumour invasive front and lymph node metastasis. Sample clustering alongside correlative assessment revealed variation in the ability of each signature to cluster samples according to patient-of-origin rather than region-of-origin within the multi-region dataset. Signatures focused on cancer-cell intrinsic gene expression were found to produce more clinically useful, patient-centred classifiers, as exemplified by the CRC intrinsic signature (CRIS), which robustly clustered samples by patient-of-origin rather than region-of-origin. These findings highlight the potential of cancer-cell intrinsic signatures to reliably stratify CRC patients by minimising the confounding effects of stromal-derived ITH.
Changes in gamma oscillations (20-50 Hz) have been observed in several neurological disorders. However, the relationship between gamma oscillations and cellular pathologies is unclear. Here we show reduced, behaviourally driven gamma oscillations before the onset of plaque formation or cognitive decline in a mouse model of Alzheimer’s disease. Optogenetically driving fast-spiking parvalbumin-positive (FS-PV)-interneurons at gamma (40 Hz), but not other frequencies, reduces levels of amyloid-β (Aβ)1-40 and Aβ 1-42 isoforms. Gene expression profiling revealed induction of genes associated with morphological transformation of microglia, and histological analysis confirmed increased microglia co-localization with Aβ. Subsequently, we designed a non-invasive 40 Hz light-flickering regime that reduced Aβ1-40 and Aβ1-42 levels in the visual cortex of pre-depositing mice and mitigated plaque load in aged, depositing mice. Our findings uncover a previously unappreciated function of gamma rhythms in recruiting both neuronal and glial responses to attenuate Alzheimer’s-disease-associated pathology.
Multimodal therapy of glioblastoma (GBM) reveals inter-individual variability in terms of treatment outcome. Here, we examined whether a miRNA signature can be defined for the a priori identification of patients with particularly poor prognosis.FFPE sections from 36 GBM patients along with overall survival follow-up were collected retrospectively and subjected to miRNA signature identification from microarray data. A risk score based on the expression of the signature miRNAs and cox-proportional hazard coefficients was calculated for each patient followed by validation in a matched GBM subset of TCGA. Genes potentially regulated by the signature miRNAs were identified by a correlation approach followed by pathway analysis.A prognostic 4-miRNA signature, independent of MGMT promoter methylation, age, and sex, was identified and a risk score was assigned to each patient that allowed defining two groups significantly differing in prognosis (p-value: 0.0001, median survival: 10.6 months and 15.1 months, hazard ratio = 3.8). The signature was technically validated by qRT-PCR and independently validated in an age- and sex-matched subset of standard-of-care treated patients of the TCGA GBM cohort (n=58). Pathway analysis suggested tumorigenesis-associated processes such as immune response, extracellular matrix organization, axon guidance, signalling by NGF, GPCR and Wnt. Here, we describe the identification and independent validation of a 4-miRNA signature that allows stratification of GBM patients into different prognostic groups in combination with one defined threshold and set of coefficients that could be utilized as diagnostic tool to identify GBM patients for improved and/or alternative treatment approaches.
Biclustering is capable of performing simultaneous clustering on two dimensions of a data matrix and has many applications in pattern classification. For example, in microarray experiments, a subset of genes is co-expressed in a subset of conditions, and biclustering algorithms can be used to detect the coherent patterns in the data for further analysis of function. In this paper, we present a graph spectrum based geometric biclustering (GSGBC) algorithm. In the geometrical view, biclusters can be seen as different linear geometrical patterns in high dimensional spaces. Based on this, the modified Hough transform is used to find the Hough vector (HV) corresponding to sub-bicluster patterns in 2D spaces. A graph can be built regarding each HV as a node. The graph spectrum is utilized to identify the eigengroups in which the sub-biclusters are grouped naturally to produce larger biclusters. Through a comparative study, we find that the GSGBC achieves as good a result as GBC and outperforms other kinds of biclustering algorithms. Also, compared with the original geometrical biclustering algorithm, it reduces the computing time complexity significantly. We also show that biologically meaningful biclusters can be identified by our method from real microarray gene expression data.
- Statistical applications in genetics and molecular biology
- Published over 6 years ago
Abstract Microarray data can be used to identify prognostic signatures based on time-to-event data. The analysis of microarrays is often associated with overfitting and many papers have dealt with this issue. However, little attention has been paid to incomplete time-to-event data (truncated and censored follow-up). We have adapted the 0.632+ bootstrap estimator for the evaluation of time-dependent ROC curves. The interpretation of ROC-based results is well-established among the scientific and medical community. Moreover, the results do not depend on the incidence of the event, as opposed to many other prognostic statistics. Here, we have tested this methodology by simulations. We have illustrated its utility by analyzing a data set of diffuse large-B-cell lymphoma patients. Our results demonstrate the well-adapted properties of the 0.632+ ROC-based approach to evaluate the true prognostic capacity of a microarray-based signature. This method has been implemented in an R package ROCt632.
Dispersants are commonly used to mitigate the impact of oil spills, however, the ecological cost associated with their use is uncertain. The toxicity of weathered oil, dispersed weathered oil, and the hydrocarbon-based dispersant Slickgone NS(®), to the diatom Phaeodactylum tricornutum has been examined using standardized toxicity tests. The assumption that most toxicity occurs via narcosis was tested by measuring membrane damage in diatoms after exposure to one of the petroleum products. The mode of toxic action was determined using microarray-based gene expression profiling in diatoms after exposure to one of the petroleum products. The diatoms were found to be much more sensitive to dispersants than to the water accommodated fraction (WAF), and more sensitive to the chemically enhanced WAF (CEWAF) than to either the WAF itself or the dispersants. Exposure to dispersants and CEWAF caused membrane damage, while exposure to WAF did not. The gene expression profiles resulting from exposure to all three petroleum mixtures were highly similar, suggesting a similar mode of action for these compounds. The observed toxicity bore no relationship to PAH concentrations in the water column or to total petroleum hydrocarbon (TPH), suggesting that an undescribed component of the oil was causing toxicity. Taken together, these results suggest that the use of dispersants to clean up oil spills will dramatically increase the oil toxicity to diatoms, and may have implications for ecological processes such as the timing of blooms necessary for recruitment.
We have previously shown that whey protein hydrolysate (WPH) causes a greater increase in muscle protein synthesis than does a mixture of amino acids that is identical in amino acid composition. The present study was conducted to investigate the effect of WPH on gene expression. Male Sprague-Dawley rats subjected to a 2 h swimming exercise were administered either a carbohydrate-amino acid diet or a carbohydrate-WPH diet immediately after exercise. At 1 h after exercise, epitrochlearis muscle mRNA was sampled and subjected to DNA microarray analysis. We found that ingestion of WPH altered 189 genes after considering the false discovery rate. Among the up-regulated genes, eight Gene Ontology (GO) terms were enriched, which included key elements such as Cd24, Ccl2, Ccl7 and Cxcl1 involved in muscle repair after exercise. In contrast, nine GO terms were enriched in gene sets that were down-regulated by the ingestion of WPH, and these GO terms fell into two clusters, ‘regulation of ATPase activity’ and ‘immune response’. Furthermore, we found that WPH activated two upstream proteins, extracellular signal-regulated kinase ½ (ERK1/2) and hypoxia-inducible factor-1α (HIF-1α), which might act as key factors for regulating gene expression. These results suggest that ingestion of WPH, compared with ingestion of a mixture of amino acids with an identical amino acid composition, induces greater changes in the post-exercise gene expression profile via activation of the proteins ERK1/2 and HIF-1α.