Concept: Free software
This software article describes the GATE family of open source text analysis tools and processes. GATE is one of the most widely used systems of its type with yearly download rates of tens of thousands and many active users in both academic and industrial contexts. In this paper we report three examples of GATE-based systems operating in the life sciences and in medicine. First, in genome-wide association studies which have contributed to discovery of a head and neck cancer mutation association. Second, medical records analysis which has significantly increased the statistical power of treatment/outcome models in the UK’s largest psychiatric patient cohort. Third, richer constructs in drug-related searching. We also explore the ways in which the GATE family supports the various stages of the lifecycle present in our examples. We conclude that the deployment of text mining for document abstraction or rich search and navigation is best thought of as a process, and that with the right computational tools and data collection strategies this process can be made defined and repeatable. The GATE research programme is now 20 years old and has grown from its roots as a specialist development tool for text processing to become a rather comprehensive ecosystem, bringing together software developers, language engineers and research staff from diverse fields. GATE now has a strong claim to cover a uniquely wide range of the lifecycle of text analysis systems. It forms a focal point for the integration and reuse of advances that have been made by many people (the majority outside of the authors' own group) who work in text processing for biomedicine and other areas. GATE is available online <1> under GNU open source licences and runs on all major operating systems. Support is available from an active user and developer community and also on a commercial basis.
Displaying chemical structures in LATEX documents currently requires either hand-coding of the structures using one of several LATEX packages, or the inclusion of finished graphics files produced with an external drawing program. There is currently no software tool available to render the large number of structures available in molfile or SMILES format to LATEX source code. We here present mol2chemfig, a Python program that provides this capability. Its output is written in the syntax defined by the chemfig TEX package, which allows for the flexible and concise description of chemical structures and reaction mechanisms. The program is freely available both through a web interface and for local installation on the user¿s computer. The code and accompanying documentation can be found at http://chimpsky.uwaterloo.ca/mol2chemfig.
Existing repositories for experimental datasets typically capture snapshots of data acquired using a single experimental technique and often require manual population and continual curation. We present a storage system for heterogeneous research data that performs dynamic automated indexing to provide powerful search, discovery and collaboration features without the restrictions of a structured repository. ADAM is able to index many commonly used file formats generated by laboratory assays and therefore offers specific advantages to the experimental biology community. However, it is not domain specific and can promote sharing and re-use of working data across scientific disciplines. Availability and implementation: ADAM is implemented using Java and supported on Linux. It is open source under the GNU General Public License v3.0. Installation instructions, binary code, a demo system and virtual machine image and are available at http://www.imperial.ac.uk/bioinfsupport/resources/software/adam. CONTACT: email@example.com.
BACKGROUND: Automated image analysis methods are becoming more and more important to extract and quantify image features in microscopy-based biomedical studies and several commercial or open-source tools are available. However, most of the approaches rely on pixel-wise operations, a concept that has limitations when high-level object features and relationships between objects are studied and if user-interactivity on the object-level is desired. RESULTS: In this paper we present an open-source software that facilitates the analysis of content features and object relationships by using objects as basic processing unit instead of individual pixels. Our approach enables also users without programming knowledge to compose “analysis pipelines” that exploit the object-level approach. We demonstrate the design and use of example pipelines for the immunohistochemistry-based cell proliferation quantification in breast cancer and two-photon fluorescence microscopy data about boneosteoclast interaction, which underline the advantages of the object-based concept. CONCLUSIONS: We introduce an open source software system that offers object-based image analysis. The object-based concept allows for a straight-forward development of object-related interactive or fully automated image analysis solutions. The presented software may therefore serve as a basis for various applications in the field of digital image analysis. Virtual Slides The virtual slide(s) for this article can be found here: http://www.diagnosticpathology.diagnomx.eu/vs/1392065570891113.
Background Reproducibility is the hallmark of good science.Maintaining a high degree of transparency in scientific reporting isessential not just for gaining trust and credibility within thescientific community but also for facilitating the development of newideas. Sharing data and computer code associated with publications isbecoming increasingly common, motivated partly in response to datadeposition requirements from journals and mandates from funders. Despitethis increase in transparency, it is still difficult to reproduce orbuild upon the findings of most scientific publications without accessto a more complete workflow.Findings Version control systems (VCS), which have long beenused to maintain code repositories in the software industry, are nowfinding new applications in science. One such open source VCS, git,provides a lightweight yet robust framework that is ideal for managingthe full suite of research outputs such as datasets, statistical code,figures, lab notes, and manuscripts. For individual researchers, gitprovides a powerful way to track and compare versions, retrace errors,explore new approaches in a structured manner, while maintaining a fullaudit trail. For larger collaborative efforts, git and git hostingservices make it possible for everyone to work asynchronously and mergetheir contributions at any time, all the while maintaining a completeauthorship trail. In this paper I provide an overview of git along withuse-cases that highlight how this tool can be leveraged to make sciencemore reproducible and transparent, foster new collaborations, andsupport novel uses.
Modern neuroscience increasingly relies on custom-developed software, but much of this is not being made available to the wider community. A group of researchers are pledging to make code they produce for data analysis and modeling open source, and are actively encouraging their colleagues to follow suit.
Data processing, management and visualization are central and critical components of a state of the art high-throughput mass spectrometry (MS)-based proteomics experiment, and are often some of the most time-consuming steps, especially for labs without much bioinformatics support. The growing interest in the field of proteomics has triggered an increase in the development of new software libraries, including freely available and open-source software. From database search analysis to post-processing of the identifications results, even though the objectives of these libraries and packages can vary significantly, they usually share a number of features. Common use cases include the handling of protein and peptide sequences, the parsing of results from various proteomics search engines output files, and the visualization of MS-related information (including mass spectra and chromatograms). In this review, we provide an overview of the existing software libraries, open-source frameworks and also, we give information on some of the freely available applications which make use of them. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era.
Pyteomics-a Python Framework for Exploratory Data Analysis and Rapid Software Prototyping in Proteomics
- Journal of the American Society for Mass Spectrometry
- Published over 5 years ago
Pyteomics is a cross-platform, open-source Python library providing a rich set of tools for MS-based proteomics. It provides modules for reading LC-MS/MS data, search engine output, protein sequence databases, theoretical prediction of retention times, electrochemical properties of polypeptides, mass and m/z calculations, and sequence parsing. Pyteomics is available under Apache license; release versions are available at the Python Package Index http://pypi.python.org/pyteomics , the source code repository at http://hg.theorchromo.ru/pyteomics , documentation at http://packages.python.org/pyteomics . Pyteomics.biolccc documentation is available at http://packages.python.org/pyteomics.biolccc/ . Questions on installation and usage can be addressed to pyteomics mailing list: firstname.lastname@example.org.
Egan K P, Brennan T A & Pignolo R J (2012) Histopathology Bone histomorphometry using free and commonly available software Aims: Histomorphometric analysis is a widely used technique to assess changes in tissue structure and function. Commercially available programs that measure histomorphometric parameters can be cost-prohibitive. In this study, we compared an inexpensive method of histomorphometry to a current proprietary software program. Methods and results: Image J and Adobe Photoshop(®) were used to measure static and kinetic bone histomorphometric parameters. Photomicrographs of Goldner’s trichrome-stained femurs were used to generate black-and-white image masks, representing bone and non-bone tissue, respectively, in Adobe Photoshop(®) . The masks were used to quantify histomorphometric parameters (bone volume, tissue volume, osteoid volume, mineralizing surface and interlabel width) in Image J. The resultant values obtained using Image J and the proprietary software were compared and differences found to be statistically non-significant. Conclusions: The wide-ranging use of histomorphometric analysis for assessing the basic morphology of tissue components makes it important to have affordable and accurate measurement options available for a diverse range of applications. Here we have developed and validated an approach to histomorphometry using commonly and freely available software that is comparable to a much more costly, commercially available software program.
Hi-C experiments explore the 3D structure of the genome, generating terabases of data to create high-resolution contact maps. Here, we introduce Juicer, an open-source tool for analyzing terabase-scale Hi-C datasets. Juicer allows users without a computational background to transform raw sequence data into normalized contact maps with one click. Juicer produces a hic file containing compressed contact matrices at many resolutions, facilitating visualization and analysis at multiple scales. Structural features, such as loops and domains, are automatically annotated. Juicer is available as open source software at http://aidenlab.org/juicer/.