Hadoop MapReduce-based approaches have become increasingly popular due to their scalability in processing large sequencing data sets. However, as these methods typically require in-depth expertise in Hadoop and Java, they are still out of reach of many bioinformaticians. In order to solve this problem, we have created SeqPig, a library and a collection of tools to manipulate, analyze and query sequencing data sets in a scalable and simple manner. SeqPig scripts use the Hadoop-based distributed scripting engine Apache Pig, which automatically parallelizes and distributes data processing tasks. We demonstrate SeqPig’s scalability over many computing nodes and illustrate its use with example scripts.Availability and Implementation: Available under the open source MIT license at http://sourceforge.net/projects/seqpig/ CONTACT: firstname.lastname@example.org SUPPLEMENTARY INFORMATION: Instructions and examples for SeqPig.
The Simcyp Simulator provides a framework for mechanistic Physiologically-Based Pharmacokinetic/Pharmacodynamic modelling of potentially interacting drugs. It also provides a scripting facility, using the Lua language, for developing customised pharmacodynamic and toxicity models driven by drug concentrations at the site of action. This article is protected by copyright. All rights reserved.
The main novelty of this paper is presenting the adaptation of Gesture Description Language (GDL) methodology to sport and rehabilitation data analysis and classification. In this paper we showed that Lua language can be successfully used for adaptation of the GDL classifier to those tasks. The newly applied scripting language allows easily extension and integration of classifier with other software technologies and applications. The obtained execution speed allows using the methodology in the real-time motion capture data processing where capturing frequency differs from 100 Hz to even 500 Hz depending on number of features or classes to be calculated and recognized. Due to this fact the proposed methodology can be used to the high-end motion capture system. We anticipate that using novel, efficient and effective method will highly help both sport trainers and physiotherapist in they practice. The proposed approach can be directly applied to motion capture data kinematics analysis (evaluation of motion without regard to the forces that cause that motion). The ability to apply pattern recognition methods for GDL description can be utilized in virtual reality environment and used for sport training or rehabilitation treatment.
We have developed BioSmalltalk, a new environment system for pure object-oriented bioinformatics programming. Adaptive end-user programming systems tend to become more important in the discovering of biological knowledge, as the emergence of open-source programming toolkits for bioinformatics have demonstrated in the last years. Our software is intended to bridge the gap between bioscientists and rapid software prototyping while preserving possibility of scaling to whole systems biology applications. BioSmalltalk performs better in terms of execution time and memory usage than Biopython and BioPerl for some classical situations. BioSmalltalk is cross-platform and freely available (MIT license) through the Google Project Hosting at http://code.google.com/p/biosmalltalk.
- IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM
- Published about 7 years ago
The problem of computing the minimum tiling path (MTP) from a set of clones arranged in a physical map is a cornerstone of hierarchical (clone-by-clone) genome sequencing projects. We formulate this problem in a graph theoretical framework, and then solve by a combination of minimum hitting set and minimum spanning tree algorithms. The tool implementing this strategy, called FMTP, shows improved performance compared to the widely used software FPC. When we execute FMTP and FPC on the same physical map, the MTP produced by FMTP covers a higher portion of the genome, and uses a smaller number of clones. For instance, on the rice genome the MTP produced by our tool would reduce by about 11% the cost of a clone-by-clone sequencing project. Source code, benchmark datasets, and documentation of FMTP are freely available at http://code.google.com/p/fingerprint-based-minimal-tiling-path/ under MIT license.
SUMMARY: The execution of a software application or pipeline using various combinations of parameters and inputs is a common task in bioinformatics. In the absence of a specialized tool to organize, streamline, and formalize this process, scientists must write frequently complex scripts to perform these tasks.We present nestly, a Python package to facilitate running tools with nested combinations of parameters and inputs. nestly provides three components: first, a module to build nested directory structures corresponding to choices of parameters. Second, the nestrun script to run a given command using each set of parameter choices. Third, the nestagg script to aggregate results of the individual runs into a CSV file, as well as support for more complex aggregation. We also include a module for easily specifying nested dependencies for the SCons build tool, enabling incremental builds.Availability and implementation: Source, documentation, and tutorial examples are available at http://github.com/fhcrc/nestly. nestly can be installed from the Python Package Index via pip; it is open source (MIT license). CONTACT: email@example.com or firstname.lastname@example.org.