Concept: Open source
Open access, open data, open source, and other open scholarship practices are growing in popularity and necessity. However, widespread adoption of these practices has not yet been achieved. One reason is that researchers are uncertain about how sharing their work will affect their careers. We review literature demonstrating that open research is associated with increases in citations, media attention, potential collaborators, job opportunities, and funding opportunities. These findings are evidence that open research practices bring significant benefits to researchers relative to more traditional closed practices.
Heather Joseph, Executive Director of SPARC, contributes to our Tenth Anniversary PLOS Biology Collection by discussing how the Open Access movement has grown up.
Journal policy on research data and code availability is an important part of the ongoing shift toward publishing reproducible computational science. This article extends the literature by studying journal data sharing policies by year (for both 2011 and 2012) for a referent set of 170 journals. We make a further contribution by evaluating code sharing policies, supplemental materials policies, and open access status for these 170 journals for each of 2011 and 2012. We build a predictive model of open data and code policy adoption as a function of impact factor and publisher and find higher impact journals more likely to have open data and code policies and scientific societies more likely to have open data and code policies than commercial publishers. We also find open data policies tend to lead open code policies, and we find no relationship between open data and code policies and either supplemental material policies or open access journal status. Of the journals in this study, 38% had a data policy, 22% had a code policy, and 66% had a supplemental materials policy as of June 2012. This reflects a striking one year increase of 16% in the number of data policies, a 30% increase in code policies, and a 7% increase in the number of supplemental materials policies. We introduce a new dataset to the community that categorizes data and code sharing, supplemental materials, and open access policies in 2011 and 2012 for these 170 journals.
Open source drug discovery offers potential for developing new and inexpensive drugs to combat diseases that disproportionally affect the poor. The concept borrows two principle aspects from open source computing (i.e., collaboration and open access) and applies them to pharmaceutical innovation. By opening a project to external contributors, its research capacity may increase significantly. To date there are only a handful of open source R&D projects focusing on neglected diseases. We wanted to learn from these first movers, their successes and failures, in order to generate a better understanding of how a much-discussed theoretical concept works in practice and may be implemented.
This software article describes the GATE family of open source text analysis tools and processes. GATE is one of the most widely used systems of its type with yearly download rates of tens of thousands and many active users in both academic and industrial contexts. In this paper we report three examples of GATE-based systems operating in the life sciences and in medicine. First, in genome-wide association studies which have contributed to discovery of a head and neck cancer mutation association. Second, medical records analysis which has significantly increased the statistical power of treatment/outcome models in the UK’s largest psychiatric patient cohort. Third, richer constructs in drug-related searching. We also explore the ways in which the GATE family supports the various stages of the lifecycle present in our examples. We conclude that the deployment of text mining for document abstraction or rich search and navigation is best thought of as a process, and that with the right computational tools and data collection strategies this process can be made defined and repeatable. The GATE research programme is now 20 years old and has grown from its roots as a specialist development tool for text processing to become a rather comprehensive ecosystem, bringing together software developers, language engineers and research staff from diverse fields. GATE now has a strong claim to cover a uniquely wide range of the lifecycle of text analysis systems. It forms a focal point for the integration and reuse of advances that have been made by many people (the majority outside of the authors' own group) who work in text processing for biomedicine and other areas. GATE is available online <1> under GNU open source licences and runs on all major operating systems. Support is available from an active user and developer community and also on a commercial basis.
MOCAT is a highly configurable, modular pipeline for fast, standardized processing of single or paired-end sequencing data generated by the Illumina platform. The pipeline uses state-of-the-art programs to quality control, map, and assemble reads from metagenomic samples sequenced at a depth of several billion base pairs, and predict protein-coding genes on assembled metagenomes. Mapping against reference databases allows for read extraction or removal, as well as abundance calculations. Relevant statistics for each processing step can be summarized into multi-sheet Excel documents and queryable SQL databases. MOCAT runs on UNIX machines and integrates seamlessly with the SGE and PBS queuing systems, commonly used to process large datasets. The open source code and modular architecture allow users to modify or exchange the programs that are utilized in the various processing steps. Individual processing steps and parameters were benchmarked and tested on artificial, real, and simulated metagenomes resulting in an improvement of selected quality metrics. MOCAT can be freely downloaded at http://www.bork.embl.de/mocat/.
BACKGROUND: Experimental datasets are becoming larger and increasingly complex, spanning different data domains, thereby expanding the requirements for respective tool support for their analysis. Networks provide a basis for the integration, analysis and visualization of multi-omics experimental datasets. RESULTS: Here we present VANTED (version 2), a framework for systems biology applications, which comprises a comprehensive set of seven main tasks. These range from network reconstruction, data visualization, integration of various data types, network simulation to data exploration combined with a manifold support of systems biology standards for visualization and data exchange. The offered set of functionalities is instantiated by combining several tasks in order to enable users to view and explore a comprehensive dataset from different perspectives. We describe the system as well as an exemplary workflow. CONCLUSIONS: VANTED is a stand-alone framework which supports scientists during the data analysis and interpretation phase. It is available as a Java open source tool from http://www.vanted.org.
BACKGROUND: Automated image analysis methods are becoming more and more important to extract and quantify image features in microscopy-based biomedical studies and several commercial or open-source tools are available. However, most of the approaches rely on pixel-wise operations, a concept that has limitations when high-level object features and relationships between objects are studied and if user-interactivity on the object-level is desired. RESULTS: In this paper we present an open-source software that facilitates the analysis of content features and object relationships by using objects as basic processing unit instead of individual pixels. Our approach enables also users without programming knowledge to compose “analysis pipelines” that exploit the object-level approach. We demonstrate the design and use of example pipelines for the immunohistochemistry-based cell proliferation quantification in breast cancer and two-photon fluorescence microscopy data about boneosteoclast interaction, which underline the advantages of the object-based concept. CONCLUSIONS: We introduce an open source software system that offers object-based image analysis. The object-based concept allows for a straight-forward development of object-related interactive or fully automated image analysis solutions. The presented software may therefore serve as a basis for various applications in the field of digital image analysis. Virtual Slides The virtual slide(s) for this article can be found here: http://www.diagnosticpathology.diagnomx.eu/vs/1392065570891113.
A negative consequence of the rapid growth of scholarly open access publishing funded by article processing charges is the emergence of publishers and journals with highly questionable marketing and peer review practices. These so-called predatory publishers are causing unfounded negative publicity for open access publishing in general. Reports about this branch of e-business have so far mainly concentrated on exposing lacking peer review and scandals involving publishers and journals. There is a lack of comprehensive studies about several aspects of this phenomenon, including extent and regional distribution.
The Internet has transformed scholarly publishing, most notably, by the introduction of open access publishing. Recently, there has been a rise of online journals characterized as ‘predatory’, which actively solicit manuscripts and charge publications fees without providing robust peer review and editorial services. We carried out a cross-sectional comparison of characteristics of potential predatory, legitimate open access, and legitimate subscription-based biomedical journals.