Displaying chemical structures in LATEX documents currently requires either hand-coding of the structures using one of several LATEX packages, or the inclusion of finished graphics files produced with an external drawing program. There is currently no software tool available to render the large number of structures available in molfile or SMILES format to LATEX source code. We here present mol2chemfig, a Python program that provides this capability. Its output is written in the syntax defined by the chemfig TEX package, which allows for the flexible and concise description of chemical structures and reaction mechanisms. The program is freely available both through a web interface and for local installation on the user¿s computer. The code and accompanying documentation can be found at http://chimpsky.uwaterloo.ca/mol2chemfig.
Adequate normalization minimizes the effects of systematic technical variations and is a prerequisite for getting meaningful biological changes. However, there is inconsistency about miRNA normalization performances and recommendations. Thus, we investigated the impact of seven different normalization methods (reference gene index, global geometric mean, quantile, invariant selection, loess, loessM, and generalized procrustes analysis) on intra- and inter-platform performance of two distinct and commonly used miRNA profiling platforms.
We provide a novel method, DRISEE (duplicate read inferred sequencing error estimation), to assess sequencing quality (alternatively referred to as “noise” or “error”) within and/or between sequencing samples. DRISEE provides positional error estimates that can be used to inform read trimming within a sample. It also provides global (whole sample) error estimates that can be used to identify samples with high or varying levels of sequencing error that may confound downstream analyses, particularly in the case of studies that utilize data from multiple sequencing samples. For shotgun metagenomic data, we believe that DRISEE provides estimates of sequencing error that are more accurate and less constrained by technical limitations than existing methods that rely on reference genomes or the use of scores (e.g. Phred). Here, DRISEE is applied to (non amplicon) data sets from both the 454 and Illumina platforms. The DRISEE error estimate is obtained by analyzing sets of artifactual duplicate reads (ADRs), a known by-product of both sequencing platforms. We present DRISEE as an open-source, platform-independent method to assess sequencing error in shotgun metagenomic data, and utilize it to discover previously uncharacterized error in de novo sequence data from the 454 and Illumina sequencing platforms.
We present Masai, a read mapper representing the state-of-the-art in terms of speed and accuracy. Our tool is an order of magnitude faster than RazerS 3 and mrFAST, 2-4 times faster and more accurate than Bowtie 2 and BWA. The novelties of our read mapper are filtration with approximate seeds and a method for multiple backtracking. Approximate seeds, compared with exact seeds, increase filtration specificity while preserving sensitivity. Multiple backtracking amortizes the cost of searching a large set of seeds by taking advantage of the repetitiveness of next-generation sequencing data. Combined together, these two methods significantly speed up approximate search on genomic data sets. Masai is implemented in C++ using the SeqAn library. The source code is distributed under the BSD license and binaries for Linux, Mac OS X and Windows can be freely downloaded from http://www.seqan.de/projects/masai.
BACKGROUND: Recursive partitioning is a non-parametric modeling technique, widely used in regression and classification problems. Model-based recursive partitioning is used to identify groups of observations with similar values of parameters of the model of interest. The mob() function in the party package in R implements model-based recursive partitioning method. This method produces predictions based on single tree models. Predictions obtained through single tree models are very sensitive to small changes to the learning sample. We extend the model-based recursive partition method to produce predictions based on multiple tree models constructed on random samples achieved either through bootstrapping (random sampling with replacement) or subsampling (random sampling without replacement) on learning data. RESULTS: Here we present an R package called “mobForest” that implements bagging and random forests methodology for model-based recursive partitioning. The mobForest package constructs large number of model-based trees and the predictions are aggregated across these trees resulting in more stable predictions. The package also includes functions for computing predictive accuracy estimates and plots, residuals plot, and variable importance plot. CONCLUSION: The mobForest package implements a random forest type approach for model-based recursive partitioning. The R package along with it source code is available at http://CRAN.R-project.org/package=mobForest.
We present a web service to access Ensembl data using Representational State Transfer (REST). The Ensembl REST Server enables the easy retrieval of a wide range of Ensembl data by most programming languages, using standard formats such as JSON and FASTA whilst minimising client work. We also introduce bindings to the popular Ensembl Variant Effect Predictor (VEP) tool permitting large-scale programmatic variant analysis independent of any specific programming language. Availability: The Ensembl REST API can be accessed at http://rest.ensembl.org and source code is freely available under an Apache 2.0 license from http://github.com/Ensembl/ensembl-rest.
Elucidating the druggable interface of protein-protein interactions using fragment docking and coevolutionary analysis
- Proceedings of the National Academy of Sciences of the United States of America
- Published almost 2 years ago
Protein-protein interactions play a central role in cellular function. Improving the understanding of complex formation has many practical applications, including the rational design of new therapeutic agents and the mechanisms governing signal transduction networks. The generally large, flat, and relatively featureless binding sites of protein complexes pose many challenges for drug design. Fragment docking and direct coupling analysis are used in an integrated computational method to estimate druggable protein-protein interfaces. (i) This method explores the binding of fragment-sized molecular probes on the protein surface using a molecular docking-based screen. (ii) The energetically favorable binding sites of the probes, called hot spots, are spatially clustered to map out candidate binding sites on the protein surface. (iii) A coevolution-based interface interaction score is used to discriminate between different candidate binding sites, yielding potential interfacial targets for therapeutic drug design. This approach is validated for important, well-studied disease-related proteins with known pharmaceutical targets, and also identifies targets that have yet to be studied. Moreover, therapeutic agents are proposed by chemically connecting the fragments that are strongly bound to the hot spots.
A paper-based, multiplexed, microfluidic assay has been developed to visually measure alanine aminotransferase (ALT) in a fingerstick sample, generating rapid, semi-quantitative results. Prior studies indicated a need for improved accuracy; the device was subsequently optimized using an FDA-approved automated platform (Abaxis Piccolo Xpress) as a comparator. Here, we evaluated the performance of the optimized paper test for measurement of ALT in fingerstick blood and serum, as compared to Abaxis and Roche/Hitachi platforms. To evaluate feasibility of remote results interpretation, we also compared reading cell phone camera images of completed tests to reading the device in real time.
- Proceedings of the National Academy of Sciences of the United States of America
- Published over 1 year ago
The advent of social media and microblogging platforms has radically changed the way we consume information and form opinions. In this paper, we explore the anatomy of the information space on Facebook by characterizing on a global scale the news consumption patterns of 376 million users over a time span of 6 y (January 2010 to December 2015). We find that users tend to focus on a limited set of pages, producing a sharp community structure among news outlets. We also find that the preferences of users and news providers differ. By tracking how Facebook pages “like” each other and examining their geolocation, we find that news providers are more geographically confined than users. We devise a simple model of selective exposure that reproduces the observed connectivity patterns.