SciCombinator

Discover the most talked about and latest scientific content & concepts.

Concept: Reproducibility

178

BACKGROUND: Cloud computing provides an infrastructure that facilitates large scale computational analysis in a scalable, democratized fashion, However, in this context it is difficult to ensure sharing of an analysis environment and associated data in a scalable and precisely reproducible way. RESULTS: CloudMan (usecloudman.org) enables individual researchers to easily deploy, customize, and share their entire cloud analysis environment, including data, tools, and configurations. CONCLUSIONS: With the enabled customization and sharing of instances, CloudMan can be used as a platform for collaboration. The presented solution improves accessibility of cloud resources, tools, and data to the level of an individual researcher and contributes toward reproducibility and transparency of research solutions.

Concepts: Scientific method, Research, Computer, Solution, Reproducibility, Cloud computing, Utility computing, Tool

173

MOTIVATION: Since 2011, The Cancer Genome Atlas' (TCGA) files have been accessible through HTTP from a public site, creating entirely new possibilities for cancer informatics by enhancing data discovery and retrieval. Significantly, these enhancements enable the reporting of analysis results that can be fully traced to and reproduced using their source data. However, to realize this possibility, a continually updated road map of files in the TCGA is required. Creation of such a road map represents a significant data modeling challenge, due to the size and fluidity of this resource: each of the 33 cancer types is instantiated in only partially overlapping sets of analytical platforms, while the number of data files available doubles approximately every 7 months. RESULTS: We developed an engine to index and annotate the TCGA files, relying exclusively on third-generation web technologies (Web 3.0). Specifically, this engine uses JavaScript in conjunction with the World Wide Web Consortium’s (W3C) Resource Description Framework (RDF), and SPARQL, the query language for RDF, to capture metadata of files in the TCGA open-access HTTP directory. The resulting index may be queried using SPARQL, and enables file-level provenance annotations as well as discovery of arbitrary subsets of files, based on their metadata, using web standard languages. In turn, these abilities enhance the reproducibility and distribution of novel results delivered as elements of a web-based computational ecosystem. The development of the TCGA Roadmap engine was found to provide specific clues about how biomedical big data initiatives should be exposed as public resources for exploratory analysis, data mining and reproducible research. These specific design elements align with the concept of knowledge reengineering and represent a sharp departure from top-down approaches in grid initiatives such as CaBIG. They also present a much more interoperable and reproducible alternative to the still pervasive use of data portals. AVAILABILITY: A prepared dashboard, including links to source code and a SPARQL endpoint, is available at http://bit.ly/TCGARoadmap. A video tutorial is available at http://bit.ly/TCGARoadmapTutorial. CONTACT: robbinsd@uab.edu.

Concepts: Data, World Wide Web, Semantic Web, Web 2.0, Reproducibility, Resource Description Framework, The Cancer Genome Atlas, World Wide Web Consortium

163

There is a growing movement to encourage reproducibility and transparency practices in the scientific community, including public access to raw data and protocols, the conduct of replication studies, systematic integration of evidence in systematic reviews, and the documentation of funding and potential conflicts of interest. In this survey, we assessed the current status of reproducibility and transparency addressing these indicators in a random sample of 441 biomedical journal articles published in 2000-2014. Only one study provided a full protocol and none made all raw data directly available. Replication studies were rare (n = 4), and only 16 studies had their data included in a subsequent systematic review or meta-analysis. The majority of studies did not mention anything about funding or conflicts of interest. The percentage of articles with no statement of conflict decreased substantially between 2000 and 2014 (94.4% in 2000 to 34.6% in 2014); the percentage of articles reporting statements of conflicts (0% in 2000, 15.4% in 2014) or no conflicts (5.6% in 2000, 50.0% in 2014) increased. Articles published in journals in the clinical medicine category versus other fields were almost twice as likely to not include any information on funding and to have private funding. This study provides baseline data to compare future progress in improving these indicators in the scientific literature.

Concepts: Scientific method, Medicine, Evidence-based medicine, Systematic review, Meta-analysis, Falsifiability, Reproducibility, Pseudoscience

39

Recent years have seen an increase in alarming signals regarding the lack of replicability in neuroscience, psychology, and other related fields. To avoid a widespread crisis in neuroimaging research and consequent loss of credibility in the public eye, we need to improve how we do science. This article aims to be a practical guide for researchers at any stage of their careers that will help them make their research more reproducible and transparent while minimizing the additional effort that this might require. The guide covers three major topics in open science (data, code, and publications) and offers practical advice as well as highlighting advantages of adopting more open research practices that go beyond improved transparency and reproducibility.

Concepts: Scientific method, Improve, Research, Philosophy of science, Falsifiability, Reproducibility, Pseudoscience, Open research

17

Brain-Derived Neurotrophic Factor (BDNF) has attracted increasing interest as potential biomarker to support the diagnosis or monitor the efficacy of therapies in brain disorders. Circulating BDNF can be measured in serum, plasma or whole blood. However, the use of BDNF as biomarker is limited by the poor reproducibility of results, likely due to the variety of methods used for sample collection and BDNF analysis. To overcome these limitations, using sera from 40 healthy adults, we compared the performance of five ELISA kits (Aviscera-Bioscience, Biosensis, Millipore-ChemiKine™, Promega-Emax(®), R&D-System-Quantikine(®)) and one multiplexing assay (Millipore-Milliplex(®)). All kits showed 100% sample recovery and comparable range. However, they exhibited very different inter-assay variations from 5% to 20%. Inter-assay variations were higher than those declared by the manufacturers with only one exception which also had the best overall performance. Dot-blot analysis revealed that two kits selectively recognize mature BDNF, while the others reacted with both pro-BDNF and mature BDNF. In conclusion, we identified two assays to obtain reliable measurements of human serum BDNF, suitable for future clinical applications.

Concepts: Scientific method, Psychometrics, ELISA, Neurotrophin, Brain-derived neurotrophic factor, Nerve growth factor, Neurotrophins, Reproducibility

16

Neuroimaging has evolved into a widely used method to investigate the functional neuroanatomy, brain-behaviour relationships, and pathophysiology of brain disorders, yielding a literature of more than 30,000 papers. With such an explosion of data, it is increasingly difficult to sift through the literature and distinguish spurious from replicable findings. Furthermore, due to the large number of studies, it is challenging to keep track of the wealth of findings. A variety of meta-analytical methods (coordinate-based and image-based) have been developed to help summarise and integrate the vast amount of data arising from neuroimaging studies. However, the field lacks specific guidelines for the conduct of such meta-analyses. Based on our combined experience, we propose best-practice recommendations that researchers from multiple disciplines may find helpful. In addition, we provide specific guidelines and a checklist that will hopefully improve the transparency, traceability, replicability and reporting of meta-analytical results of neuroimaging data.

Concepts: Scientific method, Brain, Human brain, Psychometrics, Philosophy of science, Meta-analysis, Reproducibility, Multiplication

16

Comprehensive, reproducible and precise analysis of large sample cohorts is one of the key objectives of quantitative proteomics. Here, we present an implementation of data-independent acquisition using its parallel acquisition nature that surpasses the limitation of serial MS2 acquisition of data-dependent acquisition on a quadrupole ultra-high field Orbitrap mass spectrometer. In deep single shot data-independent acquisition, we identified and quantified 6,383 proteins in human cell lines using 2-or-more peptides/protein and over 7,100 proteins when including the 717 proteins that were identified on the basis of a single peptide sequence.  7,739 proteins were identified in mouse tissues using 2-or-more peptides/protein and 8,121 when including the 382 proteins that were identified on the basis of a single peptide sequence. Missing values for proteins were within 0.3 to 2.1% and median coefficients of variation of 4.7 to 6.2% among technical triplicates. In very complex mixtures, we could quantify 10,780 proteins and 12,192 proteins when including the 1,412 proteins that were identified on the basis of a single peptide sequence. Using this optimized DIA, we investigated large-protein networks before and after the critical period for whisker experience-induced synaptic strength in the murine somatosensory cortex 1 barrel field. This work shows that parallel mass spectrometry enables proteome profiling for discovery with high coverage, reproducibility, precision and scalability.

Concepts: Scientific method, Protein, Mass spectrometry, Peptide, Proteomics, In-gel digestion, Reproducibility, Peptide sequence

12

Although several types of architecture combining memory cells and transistors have been used to demonstrate artificial synaptic arrays, they usually present limited scalability and high power consumption. Transistor-free analog switching devices may overcome these limitations, yet the typical switching process they rely on-formation of filaments in an amorphous medium-is not easily controlled and hence hampers the spatial and temporal reproducibility of the performance. Here, we demonstrate analog resistive switching devices that possess desired characteristics for neuromorphic computing networks with minimal performance variations using a single-crystalline SiGe layer epitaxially grown on Si as a switching medium. Such epitaxial random access memories utilize threading dislocations in SiGe to confine metal filaments in a defined, one-dimensional channel. This confinement results in drastically enhanced switching uniformity and long retention/high endurance with a high analog on/off ratio. Simulations using the MNIST handwritten recognition data set prove that epitaxial random access memories can operate with an online learning accuracy of 95.1%.

Concepts: Scientific method, Memory, Computer, Germanium, Reproducibility, Epitaxy, Pseudoscience, History of computing hardware

7

Computational science has led to exciting new developments, but the nature of the work has exposed limitations in our ability to evaluate published findings. Reproducibility has the potential to serve as a minimum standard for judging scientific claims when full independent replication of a study is not possible.

Concepts: Scientific method, Science, Computer science, Nature, Falsifiability, Reproducibility, Pseudoscience, Computational science

6

The past few years have seen an emergence of approaches that leverage temporal changes in whole-brain patterns of functional connectivity (the chronnectome). In this chronnectome study, we investigate the replicability of the human brain’s inter-regional coupling dynamics during rest by evaluating two different dynamic functional network connectivity (dFNC) analysis frameworks using 7 500 functional magnetic resonance imaging (fMRI) datasets. To quantify the extent to which the emergent functional connectivity (FC) patterns are reproducible, we characterize the temporal dynamics by deriving several summary measures across multiple large, independent age-matched samples. Reproducibility was demonstrated through the existence of basic connectivity patterns (FC states) amidst an ensemble of inter-regional connections. Furthermore, application of the methods to conservatively configured (statistically stationary, linear and Gaussian) surrogate datasets revealed that some of the studied state summary measures were indeed statistically significant and also suggested that this class of null model did not explain the fMRI data fully. This extensive testing of reproducibility of similarity statistics also suggests that the estimated FC states are robust against variation in data quality, analysis, grouping, and decomposition methods. We conclude that future investigations probing the functional and neurophysiological relevance of time-varying connectivity assume critical importance.

Concepts: Scientific method, Brain, Statistics, Mathematics, Human brain, Magnetic resonance imaging, Reproducibility, Pseudoscience