SciCombinator

Discover the most talked about and latest scientific content & concepts.

Concept: Compiler

163

We present a web service to access Ensembl data using Representational State Transfer (REST). The Ensembl REST Server enables the easy retrieval of a wide range of Ensembl data by most programming languages, using standard formats such as JSON and FASTA whilst minimising client work. We also introduce bindings to the popular Ensembl Variant Effect Predictor (VEP) tool permitting large-scale programmatic variant analysis independent of any specific programming language. Availability: The Ensembl REST API can be accessed at http://rest.ensembl.org and source code is freely available under an Apache 2.0 license from http://github.com/Ensembl/ensembl-rest.

Concepts: Language, Computer program, Java, C, Programming language, Source code, Programmer, Compiler

26

Quantum computing sits at an important inflection point. For years, high-level algorithms for quantum computers have shown considerable promise, and recent advances in quantum device fabrication offer hope of utility. A gap still exists, however, between the hardware size and reliability requirements of quantum computing algorithms and the physical machines foreseen within the next ten years. To bridge this gap, quantum computers require appropriate software to translate and optimize applications (toolflows) and abstraction layers. Given the stringent resource constraints in quantum computing, information passed between layers of software and implementations will differ markedly from in classical computing. Quantum toolflows must expose more physical details between layers, so the challenge is to find abstractions that expose key details while hiding enough complexity.

Concepts: Computer, Computer program, Computation, Computer science, Programming language, Computer programming, Compiler, Theoretical computer science

11

Track data hubs provide an efficient mechanism for visualizing remotely hosted, Internet-accessible collections of genome annotations. Hub data sets can be organized, configured and fully integrated into the UCSC Genome Browser and accessed through the familiar Browser interface. For the first time, individuals can use the complete Browser feature set to view custom data sets without the overhead of setting up and maintaining a mirror.Availability and Implementation: Source code for the BigWig, BigBed and Genome Browser software is freely available for noncommercial use at http://hgdownload.cse.ucsc.edu/admin/jksrc.zip, implemented in C and supported on Linux. Binaries for the BigWig and BigBed creation and parsing utilities may be downloaded at http://hgdownload.cse.ucsc.edu/admin/exe/. BAM and VCF/tabix utilities are available from http://samtools.sourceforge.net/ and http://vcftools.sourceforge.net/. The UCSC Genome Browser is publicly accessible at http://genome.ucsc.edu.

Concepts: Computer program, Java, Implementation, Source code, University of California, Santa Cruz, Linux, Executable, Compiler

5

Viral metagenomics (viromics) is increasingly used to obtain uncultivated viral genomes, evaluate community diversity, and assess ecological hypotheses. While viromic experimental methods are relatively mature and widely accepted by the research community, robust bioinformatics standards remain to be established. Here we used in silico mock viral communities to evaluate the viromic sequence-to-ecological-inference pipeline, including (i) read pre-processing and metagenome assembly, (ii) thresholds applied to estimate viral relative abundances based on read mapping to assembled contigs, and (iii) normalization methods applied to the matrix of viral relative abundances for alpha and beta diversity estimates.

Concepts: Scientific method, Bioinformatics, Genomics, Research, Relative, The Matrix, Community, Compiler

5

In just 3 years CRISPR genome editing has transformed biology, and its popularity and potency continue to grow. New CRISPR effectors and rules for locating optimum targets continue to be reported, highlighting the need for computational CRISPR targeting tools to compile these rules and facilitate target selection and design. CHOPCHOP is one of the most widely used web tools for CRISPR- and TALEN-based genome editing. Its overarching principle is to provide an intuitive and powerful tool that can serve both novice and experienced users. In this major update we introduce tools for the next generation of CRISPR advances, including Cpf1 and Cas9 nickases. We support a number of new features that improve the targeting power, usability and efficiency of CHOPCHOP. To increase targeting range and specificity we provide support for custom length sgRNAs, and we evaluate the sequence composition of the whole sgRNA and its surrounding region using models compiled from multiple large-scale studies. These and other new features, coupled with an updated interface for increased usability and support for a continually growing list of organisms, maintain CHOPCHOP as one of the leading tools for CRISPR genome editing. CHOPCHOP v2 can be found at http://chopchop.cbu.uib.no.

Concepts: Bacteria, Organism, Computer program, Star Trek: The Next Generation, Programming language, Target Corporation, Tool, Compiler

3

For decades, formal methods have offered the promise of verified software that does not have exploitable bugs. Until recently, however, it has not been possible to verify software of sufficient complexity to be useful. Recently, that situation has changed. SeL4 is an open-source operating system microkernel efficient enough to be used in a wide range of practical applications. Its designers proved it to be fully functionally correct, ensuring the absence of buffer overflows, null pointer exceptions, use-after-free errors, etc., and guaranteeing integrity and confidentiality. The CompCert Verifying C Compiler maps source C programs to provably equivalent assembly language, ensuring the absence of exploitable bugs in the compiler. A number of factors have enabled this revolution, including faster processors, increased automation, more extensive infrastructure, specialized logics and the decision to co-develop code and correctness proofs rather than verify existing artefacts. In this paper, we explore the promise and limitations of current formal-methods techniques. We discuss these issues in the context of DARPA’s HACMS program, which had as its goal the creation of high-assurance software for vehicles, including quadcopters, helicopters and automobiles.This article is part of the themed issue ‘Verified trustworthy software systems’.

Concepts: Computer program, C, Programming language, Source code, Formal verification, Verification, Formal methods, Compiler

3

The National Aeronautics Space Agency (NASA) Solar Dynamics Observatory (SDO) mission has given us unprecedented insight into the Sun’s activity. By capturing approximately 70,000 images a day, this mission has created one of the richest and biggest repositories of solar image data available to mankind. With such massive amounts of information, researchers have been able to produce great advances in detecting solar events. In this resource, we compile SDO solar data into a single repository in order to provide the computer vision community with a standardized and curated large-scale dataset of several hundred thousand solar events found on high resolution solar images. This publicly available resource, along with the generation source code, will accelerate computer vision research on NASA’s solar image data by reducing the amount of time spent performing data acquisition and curation from the multiple sources we have compiled. By improving the quality of the data with thorough curation, we anticipate a wider adoption and interest from the computer vision to the solar physics community.

Concepts: Sun, Computer graphics, Computer program, Assembly language, Source code, NASA, Compiler, Goddard Space Flight Center

2

BarraCUDA is an open source C program which uses the BWA algorithm in parallel with nVidia CUDA to align short next generation DNA sequences against a reference genome. Recently its source code was optimised using “Genetic Improvement”.

Concepts: DNA, Gene, Genetics, Computer program, Java, Programming language, Source code, Compiler

2

MEGAHIT is a NGS de novo assembler for assembling large and complex metagenomics data in a time- and cost-efficient manner. It finished assembling a soil metagenomics dataset with 252Gbps in 44.1 hours and 99.6 hours on a single computing node with and without a GPU, respectively. MEGAHIT assembles the data as a whole, i.e., no pre-processing like partitioning and normalization was needed. When compared with previous methods (Chikhi and Rizk, 2012; Howe, et al., 2014) on assembling the soil data, MEGAHIT generated a 3-time larger assembly, with longer contig N50 and average contig length; furthermore, 55.8% of the reads were aligned to the assembly, giving a 4-fold improvement . Availability: The source code of MEGAHIT is freely available at https://github.com/voutcn/megahit under GPLv3 license.

Concepts: Computer, Assembly language, Source code, Compiler, Macro, De Bruijn graph, De Bruijn sequence, Nicolaas Govert de Bruijn

2

MOTIVATION: Dynamic programming is ubiquitous in bioinformatics. Developing and implementing non-trivial dynamic programming algorithms is often error prone and tedious. Bellman’s GAP is a new programming system, designed to ease the development of bioinformatics tools based on the dynamic programming technique. RESULTS: In Bellman’s GAP, dynamic programming algorithms are described in a declarative style by tree grammars, evaluation algebras, and products formed thereof. This bypasses the design of explicit dynamic programming recurrences and yields programs that are free of subscript errors, modular, and easy to modify. The declarative modules are compiled into C++ code that is competitive to carefully hand-crafted implementations.This article introduces the Bellman’s GAP system and its language, GAP-L. It then demonstrates the ease of development and the degree of re-use by creating variants of two common bioinformatics algorithms. Finally, it evaluates Bellman’s GAP as an implementation platform of “real-world” bioinformatics tools. AVAILABILITY: Bellman’s GAP is available under GPL license from http://bibiserv.cebitec.uni-bielefeld.de/bellmansgap. This website includes a repository of re-usable modules for RNA folding based on thermodynamics. CONTACT: robert@techfak.uni-bielefeld.de.

Concepts: Bioinformatics, Error, Computer program, Implementation, Programming language, Recursion, Dynamic programming, Compiler