Concept: Source code
Psychologists typically rely on self-report data when quantifying mobile phone usage, despite little evidence of its validity. In this paper we explore the accuracy of using self-reported estimates when compared with actual smartphone use. We also include source code to process and visualise these data. We compared 23 participants' actual smartphone use over a two-week period with self-reported estimates and the Mobile Phone Problem Use Scale. Our results indicate that estimated time spent using a smartphone may be an adequate measure of use, unless a greater resolution of data are required. Estimates concerning the number of times an individual used their phone across a typical day did not correlate with actual smartphone use. Neither estimated duration nor number of uses correlated with the Mobile Phone Problem Use Scale. We conclude that estimated smartphone use should be interpreted with caution in psychological research.
Displaying chemical structures in LATEX documents currently requires either hand-coding of the structures using one of several LATEX packages, or the inclusion of finished graphics files produced with an external drawing program. There is currently no software tool available to render the large number of structures available in molfile or SMILES format to LATEX source code. We here present mol2chemfig, a Python program that provides this capability. Its output is written in the syntax defined by the chemfig TEX package, which allows for the flexible and concise description of chemical structures and reaction mechanisms. The program is freely available both through a web interface and for local installation on the user¿s computer. The code and accompanying documentation can be found at http://chimpsky.uwaterloo.ca/mol2chemfig.
MOCAT is a highly configurable, modular pipeline for fast, standardized processing of single or paired-end sequencing data generated by the Illumina platform. The pipeline uses state-of-the-art programs to quality control, map, and assemble reads from metagenomic samples sequenced at a depth of several billion base pairs, and predict protein-coding genes on assembled metagenomes. Mapping against reference databases allows for read extraction or removal, as well as abundance calculations. Relevant statistics for each processing step can be summarized into multi-sheet Excel documents and queryable SQL databases. MOCAT runs on UNIX machines and integrates seamlessly with the SGE and PBS queuing systems, commonly used to process large datasets. The open source code and modular architecture allow users to modify or exchange the programs that are utilized in the various processing steps. Individual processing steps and parameters were benchmarked and tested on artificial, real, and simulated metagenomes resulting in an improvement of selected quality metrics. MOCAT can be freely downloaded at http://www.bork.embl.de/mocat/.
MOTIVATION: BLAST remains one of the most widely used tools in computational biology. The rate at which new sequence data is available continues to grow exponentially, driving the emergence of new fields of biological research. At the same time multicore systems and conventional clusters are more accessible. ScalaBLAST has been designed to run on conventional multiprocessor systems with an eye to extreme parallelism, enabling parallel BLAST calculations using over 16,000 processing cores with a portable, robust, fault-resilient design that introduces little to no overhead with respect to serial BLAST. ScalaBLAST 2.0 source code can be freely downloaded from http://omics.pnl.gov/software/ScalaBLAST.php.
We present Masai, a read mapper representing the state-of-the-art in terms of speed and accuracy. Our tool is an order of magnitude faster than RazerS 3 and mrFAST, 2-4 times faster and more accurate than Bowtie 2 and BWA. The novelties of our read mapper are filtration with approximate seeds and a method for multiple backtracking. Approximate seeds, compared with exact seeds, increase filtration specificity while preserving sensitivity. Multiple backtracking amortizes the cost of searching a large set of seeds by taking advantage of the repetitiveness of next-generation sequencing data. Combined together, these two methods significantly speed up approximate search on genomic data sets. Masai is implemented in C++ using the SeqAn library. The source code is distributed under the BSD license and binaries for Linux, Mac OS X and Windows can be freely downloaded from http://www.seqan.de/projects/masai.
Prior research suggests that United States governmental sources documenting the number of law-enforcement-related deaths (i.e., fatalities due to injuries inflicted by law enforcement officers) undercount these incidents. The National Vital Statistics System (NVSS), administered by the federal government and based on state death certificate data, identifies such deaths by assigning them diagnostic codes corresponding to “legal intervention” in accordance with the International Classification of Diseases-10th Revision (ICD-10). Newer, nongovernmental databases track law-enforcement-related deaths by compiling news media reports and provide an opportunity to assess the magnitude and determinants of suspected NVSS underreporting. Our a priori hypotheses were that underreporting by the NVSS would exceed that by the news media sources, and that underreporting rates would be higher for decedents of color versus white, decedents in lower versus higher income counties, decedents killed by non-firearm (e.g., Taser) versus firearm mechanisms, and deaths recorded by a medical examiner versus coroner.
We present a web service to access Ensembl data using Representational State Transfer (REST). The Ensembl REST Server enables the easy retrieval of a wide range of Ensembl data by most programming languages, using standard formats such as JSON and FASTA whilst minimising client work. We also introduce bindings to the popular Ensembl Variant Effect Predictor (VEP) tool permitting large-scale programmatic variant analysis independent of any specific programming language. Availability: The Ensembl REST API can be accessed at http://rest.ensembl.org and source code is freely available under an Apache 2.0 license from http://github.com/Ensembl/ensembl-rest.
Sequence alignment is a long standing problem in bioinformatics. The Basic Local Alignment Search Tool (BLAST) is one of the most popular and fundamental alignment tools. The explosive growth of biological sequences calls for speedup of sequence alignment tools such as BLAST. To this end, we develop high speed BLASTN (HS-BLASTN), a parallel and fast nucleotide database search tool that accelerates MegaBLAST-the default module of NCBI-BLASTN. HS-BLASTN builds a new lookup table using the FMD-index of the database and employs an accurate and effective seeding method to find short stretches of identities (called seeds) between the query and the database. HS-BLASTN produces the same alignment results as MegaBLAST and its computational speed is much faster than MegaBLAST. Specifically, our experiments conducted on a 12-core server show that HS-BLASTN can be 22 times faster than MegaBLAST and exhibits better parallel performance than MegaBLAST. HS-BLASTN is written in C++ and the related source code is available at https://github.com/chenying2016/queries under the GPLv3 license.
We present a general purpose, open-source software library for estimation of non-linear parameters by the Levenberg-Marquardt algorithm. The software, Gpufit, runs on a Graphics Processing Unit (GPU) and executes computations in parallel, resulting in a significant gain in performance. We measured a speed increase of up to 42 times when comparing Gpufit with an identical CPU-based algorithm, with no loss of precision or accuracy. Gpufit is designed such that it is easily incorporated into existing applications or adapted for new ones. Multiple software interfaces, including to C, Python, and Matlab, ensure that Gpufit is accessible from most programming environments. The full source code is published as an open source software repository, making its function transparent to the user and facilitating future improvements and extensions. As a demonstration, we used Gpufit to accelerate an existing scientific image analysis package, yielding significantly improved processing times for super-resolution fluorescence microscopy datasets.
Background Reproducibility is the hallmark of good science.Maintaining a high degree of transparency in scientific reporting isessential not just for gaining trust and credibility within thescientific community but also for facilitating the development of newideas. Sharing data and computer code associated with publications isbecoming increasingly common, motivated partly in response to datadeposition requirements from journals and mandates from funders. Despitethis increase in transparency, it is still difficult to reproduce orbuild upon the findings of most scientific publications without accessto a more complete workflow.Findings Version control systems (VCS), which have long beenused to maintain code repositories in the software industry, are nowfinding new applications in science. One such open source VCS, git,provides a lightweight yet robust framework that is ideal for managingthe full suite of research outputs such as datasets, statistical code,figures, lab notes, and manuscripts. For individual researchers, gitprovides a powerful way to track and compare versions, retrace errors,explore new approaches in a structured manner, while maintaining a fullaudit trail. For larger collaborative efforts, git and git hostingservices make it possible for everyone to work asynchronously and mergetheir contributions at any time, all the while maintaining a completeauthorship trail. In this paper I provide an overview of git along withuse-cases that highlight how this tool can be leveraged to make sciencemore reproducible and transparent, foster new collaborations, andsupport novel uses.