Concept: Operating system
This software article describes the GATE family of open source text analysis tools and processes. GATE is one of the most widely used systems of its type with yearly download rates of tens of thousands and many active users in both academic and industrial contexts. In this paper we report three examples of GATE-based systems operating in the life sciences and in medicine. First, in genome-wide association studies which have contributed to discovery of a head and neck cancer mutation association. Second, medical records analysis which has significantly increased the statistical power of treatment/outcome models in the UK’s largest psychiatric patient cohort. Third, richer constructs in drug-related searching. We also explore the ways in which the GATE family supports the various stages of the lifecycle present in our examples. We conclude that the deployment of text mining for document abstraction or rich search and navigation is best thought of as a process, and that with the right computational tools and data collection strategies this process can be made defined and repeatable. The GATE research programme is now 20 years old and has grown from its roots as a specialist development tool for text processing to become a rather comprehensive ecosystem, bringing together software developers, language engineers and research staff from diverse fields. GATE now has a strong claim to cover a uniquely wide range of the lifecycle of text analysis systems. It forms a focal point for the integration and reuse of advances that have been made by many people (the majority outside of the authors' own group) who work in text processing for biomedicine and other areas. GATE is available online <1> under GNU open source licences and runs on all major operating systems. Support is available from an active user and developer community and also on a commercial basis.
We present Masai, a read mapper representing the state-of-the-art in terms of speed and accuracy. Our tool is an order of magnitude faster than RazerS 3 and mrFAST, 2-4 times faster and more accurate than Bowtie 2 and BWA. The novelties of our read mapper are filtration with approximate seeds and a method for multiple backtracking. Approximate seeds, compared with exact seeds, increase filtration specificity while preserving sensitivity. Multiple backtracking amortizes the cost of searching a large set of seeds by taking advantage of the repetitiveness of next-generation sequencing data. Combined together, these two methods significantly speed up approximate search on genomic data sets. Masai is implemented in C++ using the SeqAn library. The source code is distributed under the BSD license and binaries for Linux, Mac OS X and Windows can be freely downloaded from http://www.seqan.de/projects/masai.
BACKGROUND: High-throughput deep-sequencing technology has generated an unprecedented number of expressed sequence reads that offer the opportunity to get insight into biological systems. Several databases report the sequence of small regulatory RNAs which play a prominent role in the control of transposable elements (TE). However, the huge amount of data reported in these databases remains mostly unexplored because the available tools are hard for biologists to use. RESULTS: Here we report NucBase, a new program designed to make an exhaustive search for sequence matches and to align short sequence reads from large nucleic acid databases to genomes or input sequences. NucBase includes a graphical interface which allows biologists to align sequences with ease and immediately visualize matched sequences, their number and their genomic position. NucBase identifies nucleic motives with strict identity to input sequences, and it capably finds candidates with one or several mismatches. It offers the opportunity to identify “core sequences” comprised of a chosen number of consecutive matching nucleotides. This software can be run locally on any Windows, Linux or Mac OS computer with 32-bit architecture compatibility. CONCLUSIONS: Since this software is easy to use and can detect reads that were undetected by other software, we believe that it will be useful for biologists involved in the field of TE silencing by small non-coding RNAs. We hope NucBase will be useful for a larger community of researchers, since it makes exploration of small nucleic sequences in any organism much easier.
BACKGROUND:Theoretically, communication systems have the potential to increase the productivity of anesthesiologists supervising anesthesia providers. We evaluated the maximal potential of communication systems to increase the productivity of anesthesia care by enhancing anesthesiologists' coordination of care (activities) among operating rooms (ORs).METHODS:At hospital A, data for 13,368 pages were obtained from files recorded in the internal alphanumeric text paging system. Pages from the postanesthesia care unit were processed through a numeric paging system and thus not included. At hospital B, in a different US state, 3 of the authors categorized each of 898 calls received using the internal wireless audio system (Vocera(®)). Lower and upper 95% confidence limits for percentages are the values reported.RESULTS:At least 45% of pages originated from outside the ORs (e.g., 20% from holding area) at hospital A and at least 56% of calls (e.g., 30% administrative) at hospital B. In contrast, requests from ORs for urgent presence of the anesthesiologist were at most 0.2% of pages at hospital A and 1.8% of calls at hospital B.CONCLUSIONS:Approximately half of messages to supervising anesthesiologists are for activity originating outside the ORs being supervised. To use communication tools to increase anesthesia productivity on the day of surgery, their use should include a focus on care coordination outside ORs (e.g., holding area) and among ORs (e.g., at the control desk).
In March 2015, Apple Inc announced ResearchKit, a novel open-source framework intended to help medical researchers to easily create apps for medical studies. With the announcement of this framework, Apple presented 5 apps built in a beta phase based on this framework.
FreeSurfer is a popular software package to measure cortical thickness and volume of neuroanatomical structures. However, little if any is known about measurement reliability across various data processing conditions. Using a set of 30 anatomical T1-weighted 3T MRI scans, we investigated the effects of data processing variables such as FreeSurfer version (v4.3.1, v4.5.0, and v5.0.0), workstation (Macintosh and Hewlett-Packard), and Macintosh operating system version (OSX 10.5 and OSX 10.6). Significant differences were revealed between FreeSurfer version v5.0.0 and the two earlier versions. These differences were on average 8.8 ± 6.6% (range 1.3-64.0%) (volume) and 2.8 ± 1.3% (1.1-7.7%) (cortical thickness). About a factor two smaller differences were detected between Macintosh and Hewlett-Packard workstations and between OSX 10.5 and OSX 10.6. The observed differences are similar in magnitude as effect sizes reported in accuracy evaluations and neurodegenerative studies.The main conclusion is that in the context of an ongoing study, users are discouraged to update to a new major release of either FreeSurfer or operating system or to switch to a different type of workstation without repeating the analysis; results thus give a quantitative support to successive recommendations stated by FreeSurfer developers over the years. Moreover, in view of the large and significant cross-version differences, it is concluded that formal assessment of the accuracy of FreeSurfer is desirable.
DNA is an attractive medium to store digital information. Here we report a storage strategy, called DNA Fountain, that is highly robust and approaches the information capacity per nucleotide. Using our approach, we stored a full computer operating system, movie, and other files with a total of 2.14 × 10(6) bytes in DNA oligonucleotides and perfectly retrieved the information from a sequencing coverage equivalent to a single tile of Illumina sequencing. We also tested a process that can allow 2.18 × 10(15) retrievals using the original DNA sample and were able to perfectly decode the data. Finally, we explored the limit of our architecture in terms of bytes per molecule and obtained a perfect retrieval from a density of 215 petabytes per gram of DNA, orders of magnitude higher than previous reports.
Super-resolved structured illumination microscopy (SR-SIM) is an important tool for fluorescence microscopy. SR-SIM microscopes perform multiple image acquisitions with varying illumination patterns, and reconstruct them to a super-resolved image. In its most frequent, linear implementation, SR-SIM doubles the spatial resolution. The reconstruction is performed numerically on the acquired wide-field image data, and thus relies on a software implementation of specific SR-SIM image reconstruction algorithms. We present fairSIM, an easy-to-use plugin that provides SR-SIM reconstructions for a wide range of SR-SIM platforms directly within ImageJ. For research groups developing their own implementations of super-resolution structured illumination microscopy, fairSIM takes away the hurdle of generating yet another implementation of the reconstruction algorithm. For users of commercial microscopes, it offers an additional, in-depth analysis option for their data independent of specific operating systems. As a modular, open-source solution, fairSIM can easily be adapted, automated and extended as the field of SR-SIM progresses.
Dendroscope 3 is a new program for working with rooted phylogenetic trees and networks. It provides a number of methods for drawing and comparing rooted phylogenetic networks, and for computing them from rooted trees. The program can be used interactively or in command-line mode. The program is written in Java, use of the software is free, and installers for all 3 major operating systems can be downloaded from www.dendroscope.org. [Phylogenetic trees; phylogenetic networks; software.].
DIYABC is a software package for a comprehensive analysis of population history using approximate Bayesian computation (ABC) on DNA polymorphism data. Version 2.0 implements a number of new features and analytical methods. It allows: (i) the analysis of single nucleotide polymorphism (SNP) data at large number of loci, apart from microsatellite and DNA sequence data; (ii) efficient Bayesian model choice using linear discriminant analysis on summary statistics; and (iii) the serial launching of multiple post-processing analyses. DIYABC v2.0 also includes a user-friendly graphical interface with various new options. It can be run on three operating systems: GNU/Linux, Microsoft Windows and Apple Os X.