SciCombinator

Discover the most talked about and latest scientific content & concepts.

Concept: XML

168

BACKGROUND: U-Compare is a text mining platform that allows the construction, evaluation and comparison of text miningworkflows. U-Compare contains a large library of components that are tuned to the biomedical domain. Userscan rapidly develop biomedical text mining workflows by mixing and matching U-Compare’s components.Workflows developed using U-Compare can be exported and sent to other users who, in turn, can import andre-use them. However, the resulting workflows are standalone applications, i.e., software tools that run and areaccessible only via a local machine, and that can only be run with the U-Compare platform. RESULTS: We address the above issues by extending U-Compare to convert standalone workflows into web servicesautomatically, via a two-click process. The resulting web services can be registered on a central server andmade publicly available. Alternatively, users can make web services available on their own servers, afterinstalling the web application framework, which is part of the extension to U-Compare. We have performed auser-oriented evaluation of the proposed extension, by asking users who have tested the enhanced functionalityof U-Compare to complete questionnaires that assess its functionality, reliability, usability, efficiency andmaintainability. The results obtained reveal that the new functionality is well received by users. CONCLUSIONS: The web services produced by U-Compare are built on top of open standards, i.e., REST and SOAP protocols,and therefore, they are decoupled from the underlying platform. Exported workflows can be integrated withany application that supports these open standards. We demonstrate how the newly extended U-Compareenhances the cross-platform interoperability of workflows, by seamlessly importing a number of text miningworkflow web services exported from U-Compare into Taverna, i.e., a generic scientific workflow constructionplatform.

Concepts: Data mining, Web 2.0, Internet, Web application, Import, XML, Web application framework, Software framework

26

Does PubMed Central-a government-run digital archive of biomedical articles-compete with scientific society journals? A longitudinal, retrospective cohort analysis of 13,223 articles (5999 treatment, 7224 control) published in 14 society-run biomedical research journals in nutrition, experimental biology, physiology, and radiology between February 2008 and January 2011 reveals a 21.4% reduction in full-text hypertext markup language (HTML) article downloads and a 13.8% reduction in portable document format (PDF) article downloads from the journals' websites when U.S. National Institutes of Health-sponsored articles (treatment) become freely available from the PubMed Central repository. In addition, the effect of PubMed Central on reducing PDF article downloads is increasing over time, growing at a rate of 1.6% per year. There was no longitudinal effect for full-text HTML downloads. While PubMed Central may be providing complementary access to readers traditionally underserved by scientific journals, the loss of article readership from the journal website may weaken the ability of the journal to build communities of interest around research papers, impede the communication of news and events to scientific society members and journal readers, and reduce the perceived value of the journal to institutional subscribers.-Davis, P. M. Public accessibility of biomedical articles from PubMed Central reduces journal readership-retrospective cohort analysis.

Concepts: Cohort study, Open access, XML, Markup language, Wiki, HTML, XHTML, DocBook

10

The number of image analysis tools supporting the extraction of architectural features of root systems has increased over the last years. These tools offer a handy set of complementary facilities, yet it is widely accepted that none of these software tool is able to extract in an efficient way growing array of static and dynamic features for different types of images and species. . We describe the Root System Markup Language (RSML) that has been designed to overcome two major challenges: (i) to enable portability of root architecture data between different software tools in an easy and interoperable manner allowing seamless collaborative work, and (ii) to provide a standard format upon which to base central repositories which will soon arise following the expanding worldwide root phenotyping effort. RSML follows the XML standard to store 2D or 3D image metadata, plant and root properties and geometries, continuous functions along individual root paths and a suite of annotations at the image, plant or root scales, at one or several time points. Plant ontologies are used to describe botanical entities that are relevant at the scale of root system architecture. An xml-schema describes the features and constraints of RSML and open-source packages have been developed in several languages (R, Excel, Java, Python, C#) to enable researchers to integrate RSML files into popular research workflow.

Concepts: Mathematics, Computer program, Annotation, Root, Software architecture, XML, Markup language, Weyl group

4

The original PRIDE Inspector tool was developed as an open source standalone tool to enable the visualization and validation of mass-spectrometry (MS)-based proteomics data before data submission, or already publicly available in the PRIDE (PRoteomics IDEntifications) database. The initial implementation of the tool focused on visualizing PRIDE data by supporting the PRIDE XML format and a direct access to private (password protected) and public experiments in PRIDE. The ProteomeXchange (PX) Consortium has been set up to enable a better integration of existing public proteomics repositories, maximizing its benefit to the scientific community through the implementation of standard submission and dissemination pipelines. Within the Consortium, PRIDE is focused on supporting submissions of tandem MS data. The increasing use and popularity of the new PSI (Proteomics Standards Initiative) data standards such as mzIdentML and mzTab, and the diversity of workflows supported by the PX resources, prompted us to design and implement a new suite of algorithms and libraries that would build upon the success of the original PRIDE Inspector and would enable users to visualize and validate PX complete submissions. The PRIDE Inspector Toolsuite supports the handling and visualization of different experimental output files, ranging from spectra (mzML, mzXML and the most popular peak lists formats), peptide and protein identification results (mzIdentML, PRIDE XML, mzTab), to quantification data (mzTab, PRIDE XML), using a modular and extensible set of open-source, cross-platform libraries. We believe that the PRIDE Inspector Toolsuite represents a milestone in the visualization and quality assessment of proteomics data. It is freely available at http://github.com/PRIDE-Toolsuite/.

Concepts: Protein, Validation, Implementation, Standard, Support, Open source, XML, Bioinformatics software

3

ProXL is a web application and accompanying database designed for sharing, visualizing, and analysing bottom-up protein cross-linking mass spectrometry data with an emphasis on structural analysis and quality control. ProXL is designed to be independent of any particular software pipeline. The import process is simplified by the use of the ProXL XML data format, which shields developers of data importers from the relative complexity of the relational database schema. The database and web interfaces function equally well for any software pipeline and allows data from disparate pipelines to be merged and contrasted. ProXL includes robust public and private data sharing capabilities, including a project-based interface designed to ensure security and facilitate collaboration among multiple researchers. ProXL provides multiple interactive and highly dynamic data visualizations that facilitate structural-based analysis of the observed cross-links as well as quality control. ProXL is open-source, well-documented, and freely available at https://github.com/yeastrc/proxl-web-app.

Concepts: Scientific method, Database, User interface, Relational database, Relational algebra, Relational model, Relation, XML

3

Personal Health Intervention Toolkit (PHIT) is an advanced cross-platform software framework targeted at personal self-help research on mobile devices. Following the subjective and objective measurement, assessment, and plan methodology for health assessment and intervention recommendations, the PHIT platform lets researchers quickly build mobile health research Android and iOS apps. They can (1) create complex data-collection instruments using a simple extensible markup language (XML) schema; (2) use Bluetooth wireless sensors; (3) create targeted self-help interventions based on collected data via XML-coded logic; (4) facilitate cross-study reuse from the library of existing instruments and interventions such as stress, anxiety, sleep quality, and substance abuse; and (5) monitor longitudinal intervention studies via daily upload to a Web-based dashboard portal. For physiological data, Bluetooth sensors collect real-time data with on-device processing. For example, using the BinarHeartSensor, the PHIT platform processes the heart rate data into heart rate variability measures, and plots these data as time-series waveforms. Subjective data instruments are user data-entry screens, comprising a series of forms with validation and processing logic. The PHIT instrument library consists of over 70 reusable instruments for various domains including cognitive, environmental, psychiatric, psychosocial, and substance abuse. Many are standardized instruments, such as the Alcohol Use Disorder Identification Test, Patient Health Questionnaire-8, and Post-Traumatic Stress Disorder Checklist. Autonomous instruments such as battery and global positioning system location support continuous background data collection. All data are acquired using a schedule appropriate to the app’s deployment. The PHIT intelligent virtual advisor (iVA) is an expert system logic layer, which analyzes the data in real time on the device. This data analysis results in a tailored app of interventions and other data-collection instruments. For example, if a user anxiety score exceeds a threshold, the iVA might add a meditation intervention to the task list in order to teach the user how to relax, and schedule a reassessment using the anxiety instrument 2 weeks later to re-evaluate. If the anxiety score exceeds a higher threshold, then an advisory to seek professional help would be displayed. Using the easy-to-use PHIT scripting language, the researcher can program new instruments, the iVA, and interventions to their domain-specific needs. The iVA, instruments, and interventions are defined via XML files, which facilities rapid app development and deployment. The PHIT Web-based dashboard portal provides the researcher access to all the uploaded data. After a secure login, the data can be filtered by criteria such as study, protocol, domain, and user. Data can also be exported into a comma-delimited file for further processing. The PHIT framework has proven to be an extensible, reconfigurable technology that facilitates mobile data collection and health intervention research. Additional plans include instrument development in other domains, additional health sensors, and a text messaging notification system.

Concepts: Posttraumatic stress disorder, Anxiety disorder, PHP, Mobile phone, Global Positioning System, XML, Markup language, HTML

3

BACKGROUND: Multidisciplinary integrated research requires the ability to couple the diverse sets of data obtained from a range of complex experiments and computer simulations. Integrating data requires semantically rich information. In this paper an end-to-end use of semantically rich data in computational chemistry is demonstrated utilizing the Chemical Markup Language (CML) framework. Semantically rich data is generated by the NWChem computational chemistry software with the FoX library and utilized by the Avogadro molecular editor for analysis and visualization. RESULTS: The NWChem computational chemistry software has been modified and coupled to the FoX library to write CML compliant XML data files. The FoX library was expanded to represent the lexical input files and molecular orbitals used by the computational chemistry software. Draft dictionary entries and a format for molecular orbitals within CML CompChem were developed. The Avogadro application was extended to read in CML data, and display molecular geometry and electronic structure in the GUI allowing for an end-to-end solution where Avogadro can create input structures, generate input files, NWChem can run the calculation and Avogadro can then read in and analyse the CML output produced. The developments outlined in this paper will be made available in future releases of NWChem, FoX, and Avogadro. CONCLUSIONS: The production of CML compliant XML files for computational chemistry software such as NWChem can be accomplished relatively easily using the FoX library. The CML data can be read in by a newly developed reader in Avogadro and analysed or visualized in various ways. A community-based effort is needed to further develop the CML CompChem convention and dictionary. This will enable the long-term goal of allowing a researcher to run simple “Google-style” searches of chemistry and physics and have the results of computational calculations returned in a comprehensible form alongside articles from the published literature.

Concepts: Molecule, Chemistry, Computational chemistry, Quantum chemistry, Semiotics, Cheminformatics, XML, Computational chemistry software

3

BACKGROUND: An avalanche of next generation sequencing (NGS) studies has generated an unprecedented amount of genomic structural variation data. These studies have also identified many novel gene fusion candidates with more detailed resolution than previously achieved. However, in the excitement and necessity of publishing the observations from this recently developed cutting-edge technology, no community standardization approach has arisen to organize and represent the data with the essential attributes in an interchangeable manner. As transcriptome studies have been widely used for gene fusion discoveries, the current non-standard mode of data representation could potentially impede data accessibility, critical analyses, and further discoveries in the near future. RESULTS: Here we propose a prototype, Gene Fusion Markup Language (GFML) as an initiative to provide a standard format for organizing and representing the significant features of gene fusion data. GFML will offer the advantage of representing the data in a machine-readable format to enable data exchange, automated analysis interpretation, and independent verification. As this database-independent exchange initiative evolves it will further facilitate the formation of related databases, repositories, and analysis tools. The GFML prototype is made available at http://code.google.com/p/gfml-prototype/ CONCLUSION: The Gene Fusion Markup Language (GFML) presented here could facilitate the development of a standard format for organizing, integrating and representing the significant features of gene fusion data in an inter-operable and query-able fashion that will enable biologically intuitive access to gene fusion findings and expedite functional characterization. A similar model is envisaged for other NGS data analyses.

Concepts: Gene, Genetics, Evolution, Biology, Logic, Standardization, Analysis, XML

2

A correction to this article has been published and is linked from the HTML and PDF versions of this paper. The error has been fixed in the paper.

Concepts: Immune system, Asthma, Writing, Mucus, Publishing, Publication, XML

2

In the version of this article originally published, the links and files for the Supplementary Information, including Supplementary Tables 1-5, Supplementary Figures 1-25, Supplementary Note, Supplementary Datasets 1-4 and the Life Sciences Reporting Summary, were missing in the HTML. The error has been corrected in the HTML version of this article.

Concepts: Life, Peptide, Writing, Publishing, Nonribosomal peptide, XML, HTML