SciCombinator

Discover the most talked about and latest scientific content & concepts.

Concept: Free software

176

This software article describes the GATE family of open source text analysis tools and processes. GATE is one of the most widely used systems of its type with yearly download rates of tens of thousands and many active users in both academic and industrial contexts. In this paper we report three examples of GATE-based systems operating in the life sciences and in medicine. First, in genome-wide association studies which have contributed to discovery of a head and neck cancer mutation association. Second, medical records analysis which has significantly increased the statistical power of treatment/outcome models in the UK’s largest psychiatric patient cohort. Third, richer constructs in drug-related searching. We also explore the ways in which the GATE family supports the various stages of the lifecycle present in our examples. We conclude that the deployment of text mining for document abstraction or rich search and navigation is best thought of as a process, and that with the right computational tools and data collection strategies this process can be made defined and repeatable. The GATE research programme is now 20 years old and has grown from its roots as a specialist development tool for text processing to become a rather comprehensive ecosystem, bringing together software developers, language engineers and research staff from diverse fields. GATE now has a strong claim to cover a uniquely wide range of the lifecycle of text analysis systems. It forms a focal point for the integration and reuse of advances that have been made by many people (the majority outside of the authors' own group) who work in text processing for biomedicine and other areas. GATE is available online <1> under GNU open source licences and runs on all major operating systems. Support is available from an active user and developer community and also on a commercial basis.

Concepts: Genome-wide association study, C, Open source, Free software, Operating system, Text mining, Source text, Arduino

175

Displaying chemical structures in LATEX documents currently requires either hand-coding of the structures using one of several LATEX packages, or the inclusion of finished graphics files produced with an external drawing program. There is currently no software tool available to render the large number of structures available in molfile or SMILES format to LATEX source code. We here present mol2chemfig, a Python program that provides this capability. Its output is written in the syntax defined by the chemfig TEX package, which allows for the flexible and concise description of chemical structures and reaction mechanisms. The program is freely available both through a web interface and for local installation on the user¿s computer. The code and accompanying documentation can be found at http://chimpsky.uwaterloo.ca/mol2chemfig.

Concepts: Computer program, Java, Programming language, Source code, Free software, Computer software, Programmer, Latex

170

Existing repositories for experimental datasets typically capture snapshots of data acquired using a single experimental technique and often require manual population and continual curation. We present a storage system for heterogeneous research data that performs dynamic automated indexing to provide powerful search, discovery and collaboration features without the restrictions of a structured repository. ADAM is able to index many commonly used file formats generated by laboratory assays and therefore offers specific advantages to the experimental biology community. However, it is not domain specific and can promote sharing and re-use of working data across scientific disciplines. Availability and implementation: ADAM is implemented using Java and supported on Linux. It is open source under the GNU General Public License v3.0. Installation instructions, binary code, a demo system and virtual machine image and are available at http://www.imperial.ac.uk/bioinfsupport/resources/software/adam. CONTACT: m.woodbridge@imperial.ac.uk.

Concepts: Computer program, Free software, Linux, Linux kernel, GNU General Public License, GNU, GNU Project, Virtual machine

166

BACKGROUND: Automated image analysis methods are becoming more and more important to extract and quantify image features in microscopy-based biomedical studies and several commercial or open-source tools are available. However, most of the approaches rely on pixel-wise operations, a concept that has limitations when high-level object features and relationships between objects are studied and if user-interactivity on the object-level is desired. RESULTS: In this paper we present an open-source software that facilitates the analysis of content features and object relationships by using objects as basic processing unit instead of individual pixels. Our approach enables also users without programming knowledge to compose “analysis pipelines” that exploit the object-level approach. We demonstrate the design and use of example pipelines for the immunohistochemistry-based cell proliferation quantification in breast cancer and two-photon fluorescence microscopy data about boneosteoclast interaction, which underline the advantages of the object-based concept. CONCLUSIONS: We introduce an open source software system that offers object-based image analysis. The object-based concept allows for a straight-forward development of object-related interactive or fully automated image analysis solutions. The presented software may therefore serve as a basis for various applications in the field of digital image analysis. Virtual Slides The virtual slide(s) for this article can be found here: http://www.diagnosticpathology.diagnomx.eu/vs/1392065570891113.

Concepts: Logic, Two-photon excitation microscopy, Object, Open source, Free software, Open content, Open-source software, Sun Microsystems

96

Background Reproducibility is the hallmark of good science.Maintaining a high degree of transparency in scientific reporting isessential not just for gaining trust and credibility within thescientific community but also for facilitating the development of newideas. Sharing data and computer code associated with publications isbecoming increasingly common, motivated partly in response to datadeposition requirements from journals and mandates from funders. Despitethis increase in transparency, it is still difficult to reproduce orbuild upon the findings of most scientific publications without accessto a more complete workflow.Findings Version control systems (VCS), which have long beenused to maintain code repositories in the software industry, are nowfinding new applications in science. One such open source VCS, git,provides a lightweight yet robust framework that is ideal for managingthe full suite of research outputs such as datasets, statistical code,figures, lab notes, and manuscripts. For individual researchers, gitprovides a powerful way to track and compare versions, retrace errors,explore new approaches in a structured manner, while maintaining a fullaudit trail. For larger collaborative efforts, git and git hostingservices make it possible for everyone to work asynchronously and mergetheir contributions at any time, all the while maintaining a completeauthorship trail. In this paper I provide an overview of git along withuse-cases that highlight how this tool can be leveraged to make sciencemore reproducible and transparent, foster new collaborations, andsupport novel uses.

Concepts: Scientific method, Science, Research, Source code, Collaboration, Free software, Pseudoscience, Revision control

28

Data processing, management and visualization are central and critical components of a state of the art high-throughput mass spectrometry (MS)-based proteomics experiment, and are often some of the most time-consuming steps, especially for labs without much bioinformatics support. The growing interest in the field of proteomics has triggered an increase in the development of new software libraries, including freely available and open-source software. From database search analysis to post-processing of the identifications results, even though the objectives of these libraries and packages can vary significantly, they usually share a number of features. Common use cases include the handling of protein and peptide sequences, the parsing of results from various proteomics search engines output files, and the visualization of MS-related information (including mass spectra and chromatograms). In this review, we provide an overview of the existing software libraries, open-source frameworks and also, we give information on some of the freely available applications which make use of them. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era.

Concepts: Protein, Bioinformatics, Mass spectrometry, Peptide, Open source, Free software, Open content, Arduino

28

Pyteomics is a cross-platform, open-source Python library providing a rich set of tools for MS-based proteomics. It provides modules for reading LC-MS/MS data, search engine output, protein sequence databases, theoretical prediction of retention times, electrochemical properties of polypeptides, mass and m/z calculations, and sequence parsing. Pyteomics is available under Apache license; release versions are available at the Python Package Index http://pypi.python.org/pyteomics , the source code repository at http://hg.theorchromo.ru/pyteomics , documentation at http://packages.python.org/pyteomics . Pyteomics.biolccc documentation is available at http://packages.python.org/pyteomics.biolccc/ . Questions on installation and usage can be addressed to pyteomics mailing list: pyteomics@googlegroups.com.

Concepts: Scientific method, Protein, Data, Java, C, Source code, Free software, Exploratory data analysis

28

Egan K P, Brennan T A & Pignolo R J (2012) Histopathology Bone histomorphometry using free and commonly available software Aims:  Histomorphometric analysis is a widely used technique to assess changes in tissue structure and function. Commercially available programs that measure histomorphometric parameters can be cost-prohibitive. In this study, we compared an inexpensive method of histomorphometry to a current proprietary software program. Methods and results:  Image J and Adobe Photoshop(®) were used to measure static and kinetic bone histomorphometric parameters. Photomicrographs of Goldner’s trichrome-stained femurs were used to generate black-and-white image masks, representing bone and non-bone tissue, respectively, in Adobe Photoshop(®) . The masks were used to quantify histomorphometric parameters (bone volume, tissue volume, osteoid volume, mineralizing surface and interlabel width) in Image J. The resultant values obtained using Image J and the proprietary software were compared and differences found to be statistically non-significant. Conclusions:  The wide-ranging use of histomorphometric analysis for assessing the basic morphology of tissue components makes it important to have affordable and accurate measurement options available for a diverse range of applications. Here we have developed and validated an approach to histomorphometry using commonly and freely available software that is comparable to a much more costly, commercially available software program.

Concepts: Measurement, Computer program, Source code, Free software, Computer software, Application software, Freeware, Adobe Flash Player

26

Hi-C experiments explore the 3D structure of the genome, generating terabases of data to create high-resolution contact maps. Here, we introduce Juicer, an open-source tool for analyzing terabase-scale Hi-C datasets. Juicer allows users without a computational background to transform raw sequence data into normalized contact maps with one click. Juicer produces a hic file containing compressed contact matrices at many resolutions, facilitating visualization and analysis at multiple scales. Structural features, such as loops and domains, are automatically annotated. Juicer is available as open source software at http://aidenlab.org/juicer/.

Concepts: Structure, Analysis, Open source, Free software, Open content, Open-source software, Sun Microsystems, 1-Click

26

A class of random sources producing far fields self-splitting intensity profiles with variable spacing between the x and y directions is introduced. The beam conditions for ensuring the sources to generate a beam are derived. Based on the derived analytical expression, the evolution behavior of the beams produced by these families of sources in free space and turbulence atmospheric are explored and comparatively analyzed. By changing the modulation parameters n and m, the degree of coherence of Gaussian Schell-model source in the x and y directions are modulated respectively, and then the number of splitting beams and the spacing between splitting beams can be adjusted. It is illustrated that the self-splitting intensity profile is stable when beams propagate in free space, but they eventually transformed into a Gaussian profiles when it passes at sufficiently large distances from its source through the turbulent atmosphere.

Concepts: Evolution, Optics, Light, Phase, Atmosphere, C, Source code, Free software