Discover the most talked about and latest scientific content & concepts.

Concept: PHP


The aim of this article is to describe a database of diphone positional frequencies in French. More specifically, we provide frequencies for word-initial, word-internal, and word-final diphones of all words extracted from a subtitle corpus of 50 million words that come from movie and TV series dialogue. We also provide intra- and intersyllable diphone frequencies, as well as interword diphone frequencies. To our knowledge, no other such tool is available to psycholinguists for the study of French sequential probabilities. This database and its new indicators should help researchers conducting new studies on speech segmentation.

Concepts: Series, Acoustics, Frequency, Cultural studies, PHP, Film, Radio


BACKGROUND: Visualization plays an essential role in genomics research by making it possible to observe correlations and trends in large datasets as well as communicate findings to others. Visual analysis, which combines visualization with analysis tools to enable seamless use of both approaches for scientific investigation, offers a powerful method for performing complex genomic analyses. However, there are numerous challenges that arise when creating rich, interactive Web-based visualizations/visual analysis applications for high-throughput genomics. These challenges include managing data flow from Web server to Web browser, integrating analysis tools and visualizations, and sharing visualizations with colleagues. RESULTS: We have created a platform simplifies the creation of Web-based visualization/visual analysis applications for high-throughput genomics. This platform provides components that make it simple to efficiently query very large datasets, draw common representations of genomic data, integrate with analysis tools, and share or publish fully interactive visualizations. Using this platform, we have created a Circos-style genome-wide viewer, a generic scatter plot for correlation analysis, an interactive phylogenetic tree, a scalable genome browser for next-generation sequencing data, and an application for systematically exploring tool parameter spaces to find good parameter values. All visualizations are interactive and fully customizable. The platform is integrated with the Galaxy ( genomics workbench, making it easy to integrate new visual applications into Galaxy. CONCLUSIONS: Visualization and visual analysis play an important role in high-throughput genomics experiments, and approaches are needed to make it easier to create applications for these activities. Our framework provides a foundation for creating Web-based visualizations and integrating them into Galaxy. Finally, the visualizations we have created using the framework are useful tools for high-throughput genomics experiments.

Concepts: Scientific method, Bioinformatics, Genome, Genomics, Integral, World Wide Web, PHP, Analysis


The inverse equity hypothesis asserts that new health policies initially widen inequality, then attenuate inequalities over time. Since 2004, the UK’s pay-for-performance scheme for chronic disease management (CDM) in primary care general practices (the Quality and Outcomes Framework) has permitted practices to except (exclude) patients from attending annual CDM reviews, without financial penalty. Informed dissent (ID) is one component of exception rates, applied to patients who have not attended due to refusal or non-response to invitations. ‘Population achievement’ describes the proportion receiving care, in relation to those eligible to receive it, including excepted patients. Examination of exception reporting (including ID) and population achievement enables the equity impact of the UK pay-for-performance contract to be assessed. We conducted a longitudinal analysis of practice-level rates and of predictors of ID, overall exceptions and population achievement for CDM to examine whether the inverse equity hypothesis holds true.

Concepts: Scientific method, Medicine, Medical terms, Chronic, Disease management, PHP, Inequality, Common Lisp


Phylogenetic trees are pervasively used to depict evolutionary relationships. Increasingly, researchers need to visualize large trees and compare multiple large trees inferred for the same set of taxa (reflecting uncertainty in the tree inference or genuine discordance among the loci analyzed). Existing tree visualization tools are however not well suited to these tasks. In particular, side-by-side comparison of trees can prove challenging beyond a few dozen taxa. Here, we introduce, a web application to visualize and compare phylogenetic trees side-by-side. Its distinctive features are: highlighting of similarities and differences between two trees, automatic identification of the best matching rooting and leaf order, scalability to large trees, high usability, multiplatform support via standard HTML5 implementation, and possibility to store and share visualizations. The tool can be freely accessed at and can easily be embedded in other web servers. The code for the associated JavaScript library is available at under an MIT open source license.

Concepts: Evolution, Difference, Phylogenetic tree, Cladistics, World Wide Web, PHP, Web 2.0, Internet


BACKGROUND: The Sequence Read Archive (SRA) is the largest public repository of sequencing data from the nextgeneration of sequencing platforms including Illumina (Genome Analyzer, HiSeq, MiSeq, .etc),Roche 454 GS System, Applied Biosystems SOLiD System, Helicos Heliscope, PacBio RS, andothers. RESULTS: SRAdb is an attempt to make query of the metadata associated with SRA submission, study, sample,experiment and run more robust and precise, and make access to sequencing data in the SRA easier.We have parsed all the SRA metadata into a SQLite database that is routinely updated and can beeasily distributed. The SRAdb R/Bioconductor package then utilizes this SQLite database forquerying and accessing metadata. Full text search functionality makes querying metadata veryflexible and powerful. Fastq files associated with query results can be downloaded easily for localanalysis. The package also includes an interface from R to a popular genome browser, the IntegratedGenomics Viewer. CONCLUSIONS: SRAdb Bioconductor package provides a convenient and integrated framework to query and accessSRA metadata quickly and powerfully from within R.

Concepts: Human Genome Project, DNA sequencing, Sequence, Controlled vocabulary, PHP, Data management, Query language, SQLite


Hadoop MapReduce-based approaches have become increasingly popular due to their scalability in processing large sequencing data sets. However, as these methods typically require in-depth expertise in Hadoop and Java, they are still out of reach of many bioinformaticians. In order to solve this problem, we have created SeqPig, a library and a collection of tools to manipulate, analyze and query sequencing data sets in a scalable and simple manner. SeqPig scripts use the Hadoop-based distributed scripting engine Apache Pig, which automatically parallelizes and distributes data processing tasks. We demonstrate SeqPig’s scalability over many computing nodes and illustrate its use with example scripts.Availability and Implementation: Available under the open source MIT license at CONTACT: SUPPLEMENTARY INFORMATION: Instructions and examples for SeqPig.

Concepts: PHP, JavaScript, Web server, Python, Scripting language, Lua, MIT License, JQuery


We describe an integrated conceptual framework for a blended approach to debriefing called PEARLS [Promoting Excellence And Reflective Learning in Simulation]. We provide a rationale for scripted debriefing and introduce a PEARLS debriefing tool designed to facilitate implementation of the new framework. The PEARLS framework integrates 3 common educational strategies used during debriefing, namely, (1) learner self-assessment, (2) facilitating focused discussion, and (3) providing information in the form of directive feedback and/or teaching. The PEARLS debriefing tool incorporates scripted language to guide the debriefing, depending on the strategy chosen. The PEARLS framework and debriefing script fill a need for many health care educators learning to facilitate debriefings in simulation-based education. The PEARLS offers a structured framework adaptable for debriefing simulations with a variety in goals, including clinical decision making, improving technical skills, teamwork training, and interprofessional collaboration.

Concepts: Health care, Psychology, Education, Educational psychology, Skill, Learning, PHP, Scripting language


ChEMBL is an open large-scale bioactivity database (, previously described in the 2012 Nucleic Acids Research Database Issue. Since then, a variety of new data sources and improvements in functionality have contributed to the growth and utility of the resource. In particular, more comprehensive tracking of compounds from research stages through clinical development to market is provided through the inclusion of data from United States Adopted Name applications; a new richer data model for representing drug targets has been developed; and a number of methods have been put in place to allow users to more easily identify reliable data. Finally, access to ChEMBL is now available via a new Resource Description Framework format, in addition to the web-based interface, data downloads and web services.

Concepts: Pharmacology, Database, United States, Nucleic acid, PHP, Semantic Web, Web application, Resource Description Framework


Overcoming Addictions (OA) is an abstinence-oriented, cognitive behavioral, Web application based on the program of SMART Recovery. SMART Recovery is an organization that has adapted empirically supported treatment strategies for use in a mutual help framework with in-person meetings, online meetings, a forum, and other resources.

Concepts: Randomized controlled trial, PHP, Web 2.0, Internet, Web application, Software architecture, Web application framework


The vast array of citizen science projects which have blossomed over the last decade span a spectrum of objectives from research to outreach. While some focus primarily on the collection of rigorous scientific data and others are positioned towards the public engagement end of the gradient, the majority of initiatives attempt to balance the two. Although meeting multiple aims can be seen as a ‘win-win’ situation, it can also yield significant challenges as allocating resources to one element means that they may be diverted away from the other. Here we analyse one such programme which set out to find an effective equilibrium between these arguably polarised goals. Through the lens of the Open Air Laboratories (OPAL) programme we explore the inherent trade-offs encountered under four indicators derived from an independent citizen science evaluation framework. Assimilating experience from the OPAL network we investigate practical approaches taken to tackle arising tensions.

Concepts: Scientific method, Mathematics, Science, Experiment, Lens, Social sciences, PHP, Span and div