SciCombinator

Discover the most talked about and latest scientific content & concepts.

Journal: Journal of cheminformatics

175

Displaying chemical structures in LATEX documents currently requires either hand-coding of the structures using one of several LATEX packages, or the inclusion of finished graphics files produced with an external drawing program. There is currently no software tool available to render the large number of structures available in molfile or SMILES format to LATEX source code. We here present mol2chemfig, a Python program that provides this capability. Its output is written in the syntax defined by the chemfig TEX package, which allows for the flexible and concise description of chemical structures and reaction mechanisms. The program is freely available both through a web interface and for local installation on the user¿s computer. The code and accompanying documentation can be found at http://chimpsky.uwaterloo.ca/mol2chemfig.

Concepts: Computer program, Java, Programming language, Source code, Free software, Computer software, Programmer, Latex

174

Since its public introduction in 2005 the IUPAC InChI chemical structure identifier standard has become the international, worldwide standard for defined chemical structures. This article will describe the extensive use and dissemination of the InChI and InChIKey structure representations by and for the world-wide chemistry community, the chemical information community, and major publishers and disseminators of chemical and related scientific offerings in manuscripts and databases.

Concepts: Chemistry, Chemical substance, Chemical structure, Identifier, International Chemical Identifier, CAS registry number, Simplified molecular input line entry specification, International Union of Pure and Applied Chemistry

173

: This paper introduces a subdomain chemistry format for storing computational chemistry data called CompChem. It has been developed based on the design, concepts and methodologies of Chemical Markup Language (CML) by adding computational chemistry semantics on top of the CML Schema. The format allows a wide range of ab initio quantum chemistry calculations of individual molecules to be stored. These calculations include, for example, single point energy calculation, molecular geometry optimization, and vibrational frequency analysis. The paper also describes the supporting infrastructure, such as processing software, dictionaries, validation tools and database repositories. In addition, some of the challenges and difficulties in developing common computational chemistry dictionaries are discussed. The uses of CompChem are illustrated by two practical applications.

Concepts: Mathematics, Quantum mechanics, Molecule, Chemistry, Computational chemistry, Quantum chemistry, Theoretical chemistry, Molecular Hamiltonian

172

BACKGROUND: A molecule editor, i.e. a program facilitating graphical input and interactive editing of molecules, is an indispensable part of every cheminformatics or molecular processing system. Today, when a web browser has become the universal scientific user interface, a tool to edit molecules directly within the web browser is essential. One of the most popular tools for molecular structure input on the web is the JME applet. Since its release nearly 15 years ago, however the web environment has changed and Java applets are facing increasing implementation hurdles due to their maintenance and support requirements, as well as security issues. This prompted us to update the JME editor and port it to a modern Internet programming language - JavaScript. SUMMARY: The actual molecule editing Java code of the JME editor was translated into JavaScript with help of the Google Web Toolkit compiler and a custom library that emulates a subset of the GUI features of the Java runtime environment. In this process, the editor was enhanced by additional functionalities including a substituent menu, copy/paste, drag and drop and undo/redo capabilities and an integrated help. In addition to desktop computers, the editor supports molecule editing on touch devices, including iPhone, iPad and Android phones and tablets. In analogy to JME the new editor is named JSME. This new molecule editor is compact, easy to use and easy to incorporate into web pages. CONCLUSIONS: A free molecule editor written in JavaScript was developed and is released under the terms of permissive BSD license. The editor is compatible with JME, has practically the same user interface as well as the web application programming interface. The JSME editor is available for download from the project web page http://peter-ertl.com/jsme/

Concepts: Java, World Wide Web, Programming language, Web browser, Web page, Google, Web server, HTML

172

BACKGROUND: Although programming in a type-safe and referentiallytransparent style offers several advantages over working withmutable data structures and side effects, this style of programminghas not seen much use in chemistry-related software. Since functionalprogramming languages were designed with referential transparency in mind,these languages offer a lot of support when writing immutable data structuresand side-effects free code. We therefore started implementingour own toolkit based on the above programming paradigms in a modern,versatile programming language. RESULTS: We present our initial results with functionalprogramming in chemistry by first describing an immutable data structurefor molecular graphs together with a couple of simplealgorithms to calculate basic molecular propertiesbefore writing a complete SMILES parser in accordance with theOpenSMILES specification. Along the way we show how to dealwith input validation, error handling, bulk operations, and parallelizationin a purely functional way. At the end we also analyze and improve our algorithmsand data structures in terms of performance and compare itto existing toolkits both object-oriented and purely functional.All code was written inScala, a modern multi-paradigm programming language with a strongsupport for functional programming and a highly sophisticated type system. CONCLUSIONS: We have successfully made the first importantsteps towards a purely functional chemistry toolkit. The data structuresand algorithms presented in this article perform well while at the sametime they can be safely used in parallelized applications, such as computeraided drug design experiments, withoutfurther adjustments. This stands in contrast to existing object-orientedtoolkits where thread safety of data structures and algorithms isa deliberate design decision that can be hard to implement.Finally, the level of type-safety achieved by \emph{Scala}highly increased the reliability of our codeas well as the productivity of the programmers involved in this project.

Concepts: Programming language, Functional programming, Type system, C Sharp, Programming paradigm, Referential transparency, Purely functional, Haskell

168

BACKGROUND: Mycobacterium tuberculosis encodes 11 putative serine-threonine proteins Kinases (STPK) which regulates transcription, cell development and interaction with the host cells. From the 11 STPKs three kinases namely PknA, PknB and PknG have been related to the mycobacterial growth. From previous studies it has been observed that PknB is essential for mycobacterial growth and expressed during log phase of the growth and phosphorylates substrates involved in peptidoglycan biosynthesis. In recent years many high affinity inhibitors are reported for PknB. Previously implementation of data fusion has shown effective enrichment of active compounds in both structure and ligand based approaches .In this study we have used three types of data fusion ranking algorithms on the PknB dataset namely, sum rank, sum score and reciprocal rank. We have identified reciprocal rank algorithm is capable enough to select compounds earlier in a virtual screening process. We have also screened the Asinex database with reciprocal rank algorithm to identify possible inhibitors for PknB. RESULTS: In our work we have used both structure-based and ligand-based approaches for virtual screening, and have combined their results using a variety of data fusion methods. We found that data fusion increases the chance of actives being ranked highly. Specifically, we found that the ranking of Pharmacophore search, ROCS and Glide XP fused with a reciprocal ranking algorithm not only outperforms structure and ligand based approaches but also capable of ranking actives better than the other two data fusion methods using the BEDROC, robust initial enhancement (RIE) and AUC metrics. These fused results were used to identify 45 candidate compounds for further experimental validation. CONCLUSION: We show that very different structure and ligand based methods for predicting drug-target interactions can be combined effectively using data fusion, outperforming any single method in ranking of actives. Such fused results show promise for a coherent selection of candidates for biological screening.

Concepts: Adenosine triphosphate, Enzyme, Tuberculosis, Non-parametric statistics, Mycobacterium, Mycobacterium tuberculosis, Ranking, RANK

158

PubChem is an open repository for chemical structures, biological activities and biomedical annotations. Semantic Web technologies are emerging as an increasingly important approach to distribute and integrate scientific data. Exposing PubChem data to Semantic Web services may help enable automated data integration and management, as well as facilitate interoperable web applications.

Concepts: Mathematics, Chemical substance, Semantic Web, Web 2.0, Internet, Reference, Data management, Web services

15

Wikipedia, the world’s largest and most popular encyclopedia is an indispensable source of chemistry information. It contains among others also entries for over 15,000 chemicals including metabolites, drugs, agrochemicals and industrial chemicals. To provide an easy access to this wealth of information we decided to develop a substructure and similarity search tool for chemical structures referenced in Wikipedia.

Concepts: Chemical reaction, Molecule, Chemistry, Nitrogen, Chemical substance, Chemical compound, Chemical structure, Chemical industry

15

Background Making data available as Linked Data using Resource Description Framework (RDF) promotes integration with other web resources. RDF documents can natively link to related data, and others can link back using Uniform Resource Identifiers (URIs).RDF makes the data machine-readable and uses extensible vocabularies for additional information, making it easierto scale up inference and data analysis.Results This paper describes recent developments in an ongoing project converting data from the ChEMBL database into RDF triples.Relative to earlier versions, this updated version of ChEMBL-RDF uses recently introduced ontologies, including CHEMINF and CiTO;exposes more information from the database; and is now available as dereferencable, linked data.To demonstrate these new features, we present novel use cases showing further integration withother web resources, including Bio2RDF, Chem2Bio2RDF, and ChemSpider, and showing the use of standardontologies for querying.Conclusions We have illustrated the advantages of using open standards and ontologies to link the ChEMBL databaseto other databases. Using those links and the knowledge encoded in standards and ontologies, the ChEMBL-RDFresource creates a foundation for integrated semantic web cheminformatics applications,such as the presented decision support.

Concepts: World Wide Web, Semantic Web, Resource Description Framework, Uniform Resource Locator, Uniform Resource Identifier, Linked Data

13

We report on the development of a cheminformatics enumeration technology and the analysis of a resulting large dataset of virtual macrolide scaffolds. Although macrolides have been shown to have valuable biological properties, there is no ready-to-screen virtual library of diverse macrolides in the public domain. Conducting molecular modeling (especially virtual screening) of these complex molecules is highly relevant as the organic synthesis of these compounds, when feasible, typically requires many synthetic steps, and thus dramatically slows the discovery of new bioactive macrolides. Herein, we introduce a cheminformatics approach and associated software that allows for designing and generating libraries of virtual macrocycle/macrolide scaffolds with user-defined constitutional and structural constraints (e.g., types and numbers of structural motifs to be included in the macrocycle, ring size, maximum number of compounds generated). To study the chemical diversity of such generated molecules, we enumerated V1M (Virtual 1 million Macrolide scaffolds) library, each containing twelve common structural motifs. For each macrolide scaffold, we calculated several key properties, such as molecular weight, hydrogen bond donors/acceptors, topological polar surface area. In this study, we discuss (1) the initial concept and current features of our PKS (polyketides) Enumerator software, (2) the chemical diversity and distribution of structural motifs in V1M library, and (3) the unique opportunities for future virtual screening of such enumerated ensembles of macrolides. Importantly, V1M is provided in the Supplementary Material of this paper allowing other researchers to conduct any type of molecular modeling and virtual screening studies. Therefore, this technology for enumerating extremely large libraries of macrolide scaffolds could hold a unique potential in the field of computational chemistry and drug discovery for rational designing of new antibiotics and anti-cancer agents.