Concept: Mac OS
We present Masai, a read mapper representing the state-of-the-art in terms of speed and accuracy. Our tool is an order of magnitude faster than RazerS 3 and mrFAST, 2-4 times faster and more accurate than Bowtie 2 and BWA. The novelties of our read mapper are filtration with approximate seeds and a method for multiple backtracking. Approximate seeds, compared with exact seeds, increase filtration specificity while preserving sensitivity. Multiple backtracking amortizes the cost of searching a large set of seeds by taking advantage of the repetitiveness of next-generation sequencing data. Combined together, these two methods significantly speed up approximate search on genomic data sets. Masai is implemented in C++ using the SeqAn library. The source code is distributed under the BSD license and binaries for Linux, Mac OS X and Windows can be freely downloaded from http://www.seqan.de/projects/masai.
FreeSurfer is a popular software package to measure cortical thickness and volume of neuroanatomical structures. However, little if any is known about measurement reliability across various data processing conditions. Using a set of 30 anatomical T1-weighted 3T MRI scans, we investigated the effects of data processing variables such as FreeSurfer version (v4.3.1, v4.5.0, and v5.0.0), workstation (Macintosh and Hewlett-Packard), and Macintosh operating system version (OSX 10.5 and OSX 10.6). Significant differences were revealed between FreeSurfer version v5.0.0 and the two earlier versions. These differences were on average 8.8 ± 6.6% (range 1.3-64.0%) (volume) and 2.8 ± 1.3% (1.1-7.7%) (cortical thickness). About a factor two smaller differences were detected between Macintosh and Hewlett-Packard workstations and between OSX 10.5 and OSX 10.6. The observed differences are similar in magnitude as effect sizes reported in accuracy evaluations and neurodegenerative studies.The main conclusion is that in the context of an ongoing study, users are discouraged to update to a new major release of either FreeSurfer or operating system or to switch to a different type of workstation without repeating the analysis; results thus give a quantitative support to successive recommendations stated by FreeSurfer developers over the years. Moreover, in view of the large and significant cross-version differences, it is concluded that formal assessment of the accuracy of FreeSurfer is desirable.
The number of metagenomes is increasing rapidly. However, current methods for metagenomic analysis are limited by their capability for in-depth data mining among a large number of microbiome each of which carries a complex community structure. Moreover, the complexity of configuring and operating computational pipeline also hinders efficient data processing for the end users. In this work we introduce Parallel-META 3, a comprehensive and fully automatic computational toolkit for rapid data mining among metagenomic datasets, with advanced features including 16S rRNA extraction for shotgun sequences, 16S rRNA copy number calibration, 16S rRNA based functional prediction, diversity statistics, bio-marker selection, interaction network construction, vector-graph-based visualization and parallel computing. Application of Parallel-META 3 on 5,337 samples with 1,117,555,208 sequences from diverse studies and platforms showed it could produce similar results as QIIME and PICRUSt with much faster speed and lower memory usage, which demonstrates its ability to unravel the taxonomical and functional dynamics patterns across large datasets and elucidate ecological links between microbiome and the environment. Parallel-META 3 is implemented in C/C++ and R, and integrated into an executive package for rapid installation and easy access under Linux and Mac OS X. Both binary and source code packages are available at http://bioinfo.single-cell.cn/parallel-meta.html.
Motivation: Storing, transmitting, and archiving data produced by next generation sequencing is a significant computational burden. New compression techniques tailored to short-read sequence data are needed. Results: We present here an approach to compression that reduces the difficulty of managing large-scale sequencing data. Our novel approach sits between pure reference-based compression and reference-free compression and combines much of the benefit of reference-based approaches with the flexibility of de novo encoding. Our method, called path encoding, draws a connection between storing paths in de Bruijn graphs and context-dependent arithmetic coding. Supporting this method is a system to compactly store sets of kmers that is of independent interest. We are able to encode RNA-seq reads using 3% - 11% of the space of the sequence in raw FASTA files, which is on average more than 34% smaller than competing approaches. We also show that even if the reference is very poorly matched to the reads that are being encoded, good compression can still be achieved. Availability and implementation: Source code and binaries freely available for download at http://www.cs.cmu.edu/~ckingsf/software/pathenc/, implemented in Go and supported on Linux and Mac OS X. Contact: firstname.lastname@example.org.
Accurate multiple sequence alignment is central to bioinformatics and molecular evolutionary analyses. Although sophisticated sequence alignment programs are available, manual adjustments are often required to improve alignment quality. Unfortunately, few programs offer a simple and intuitive way to edit sequence alignments.
We present an update to our Galaxy-based web server for processing and visualizing deeply sequenced data. Its core tool set, deepTools, allows users to perform complete bioinformatic workflows ranging from quality controls and normalizations of aligned reads to integrative analyses, including clustering and visualization approaches. Since we first described our deepTools Galaxy server in 2014, we have implemented new solutions for many requests from the community and our users. Here, we introduce significant enhancements and new tools to further improve data visualization and interpretation. deepTools continue to be open to all users and freely available as a web service atdeeptools.ie-freiburg.mpg.de The new deepTools2 suite can be easily deployed within any Galaxy framework via the toolshed repository, and we also provide source code for command line usage under Linux and Mac OS X. A public and documented API for access to deepTools functionality is also available.
The increasingly widespread use of mobile phone applications (apps) as research tools and cost-effective means of vast data collection raises new methodological challenges. In recent years, it has become a common practice for scientists to design apps that run only on a single operating system, thereby excluding large numbers of users who use a different operating system. However, empirical evidence investigating any selection biases that might result thereof is scarce. Henceforth, we conducted two studies drawing from a large multi-national (Study 1; N = 1,081) and a German-speaking sample (Study 2; N = 2,438). As such Study 1 compared iOS and Android users across an array of key personality traits (i.e., well-being, self-esteem, willingness to take risks, optimism, pessimism, Dark Triad, and the Big Five). Focusing on Big Five personality traits in a broader scope, in addition to smartphone users, Study 2 also examined users of the main computer operating systems (i.e., Mac OS, Windows). In both studies, very few significant differences were found, all of which were of small or even tiny effect size mostly disappearing after sociodemographics had been controlled for. Taken together, minor differences in personality seem to exist, but they are of small to negligible effect size (ranging from OR = 0.919 to 1.344 (Study 1), ηp2 = .005 to .036 (Study 2), respectively) and may reflect differences in sociodemographic composition, rather than operating system of smartphone users.
High-throughput single-cell technologies provide an unprecedented view into cellular heterogeneity, yet they pose new challenges in data analysis and interpretation. In this protocol, we describe the use of Spanning-tree Progression Analysis of Density-normalized Events (SPADE), a density-based algorithm for visualizing single-cell data and enabling cellular hierarchy inference among subpopulations of similar cells. It was initially developed for flow and mass cytometry single-cell data. We describe SPADE’s implementation and application using an open-source R package that runs on Mac OS X, Linux and Windows systems. A typical SPADE analysis on a 2.27-GHz processor laptop takes ∼5 min. We demonstrate the applicability of SPADE to single-cell RNA-seq data. We compare SPADE with recently developed single-cell visualization approaches based on the t-distribution stochastic neighborhood embedding (t-SNE) algorithm. We contrast the implementation and outputs of these methods for normal and malignant hematopoietic cells analyzed by mass cytometry and provide recommendations for appropriate use. Finally, we provide an integrative strategy that combines the strengths of t-SNE and SPADE to infer cellular hierarchy from high-dimensional single-cell data.
OBJECTIVES: To determine the incidence of burnout among UK and Irish urological consultants and trainees. The second objective was to identify possible aetiological factors and to investigate the impact of various vocational stressors that urologists face in their day-to-day work and to establish whether these correlate with burn out. The third objective was to develop a new questionnaire to complement the Maslach Burnout Inventory (MBI), but which would be more specific to urologists, as distinct from other surgical/medical specialties, and to use this in addition to the MBI to determine if there is a requirement to develop effective preventative measures for stress in the work place, and develop targeted remedial measures when individuals are affected by burnout MATERIALS&METHODS: A joint collaboration was carried out between the Irish Society of Urology (ISU) and the British Association of Urological Surgeons (BAUS). Anonymous voluntary questionnaires were sent to all current registered members of both governing bodies. The questionnaire comprised of two parts. The first part encompassed sociodemographic data collection and identifying potential risk factors for burnout, and the second utilized the Maslach Burnout inventory (MBI) to objectively assess for workplace burnout. Statistical analysis was performed using GraphPad Prism Version 6.0b for Mac OS X. To evaluate differences in burnout, 2x2 contingency tables and Fischer’s exact probability tests were used to demonstrate statistical significance. P-values <0.05 were taken as statistically significant.
The UltraScan SOlution MOdeller (US-SOMO) is a comprehensive, public domain, open-source suite of computer programs centred on hydrodynamic modelling and small-angle scattering (SAS) data analysis and simulation. We describe here the advances that have been implemented since its last official release (#3087, 2017), which are available from release #3141 for Windows, Linux and Mac operating systems. A major effort has been the transition from the legacy Qt3 cross platform software development and user interface library to the modern Qt5 release. Apart from improved graphical support, this has allowed the direct implementation of the newest, almost two-orders of magnitude faster version of the ZENO hydrodynamic computation algorithm for all operating systems. Coupled with the SoMo-generated bead models with overlaps, ZENO provides the most accurate translational friction computations from atomic-level structures available (Rocco and Byron Eur Biophys J 44:417-431, 2015a), with computational times comparable with or faster than those of other methods. In addition, it has allowed us to introduce the direct representation of each atom in a structure as a (hydrated) bead, opening interesting new modelling possibilities. In the small-angle scattering (SAS) part of the suite, an indirect Fourier transform Bayesian algorithm has been implemented for the computation of the pairwise distance distribution function from SAS data. Finally, the SAS HPLC module, recently upgraded with improved baseline correction and Gaussian decomposition of not baseline-resolved peaks and with advanced statistical evaluation tools (Brookes et al. J Appl Cryst 49:1827-1841, 2016), now allows automatic top-peak frame selection and averaging.