### Concept: Parallel computing

#### 174

The Microbial Fuel Cell (MFC) is a bio-electrochemical transducer converting waste products into electricity using microbial communities. Cellular Automaton (CA) is a uniform array of finite-state machines that update their states in discrete time depending on states of their closest neighbors by the same rule. Arrays of MFCs could, in principle, act as massive-parallel computing devices with local connectivity between elementary processors. We provide a theoretical design of such a parallel processor by implementing CA in MFCs. We have chosen Conway’s Game of Life as the ‘benchmark’ CA because this is the most popular CA which also exhibits an enormously rich spectrum of patterns. Each cell of the Game of Life CA is realized using two MFCs. The MFCs are linked electrically and hydraulically. The model is verified via simulation of an electrical circuit demonstrating equivalent behaviours. The design is a first step towards future implementations of fully autonomous biological computing devices with massive parallelism. The energy independence of such devices counteracts their somewhat slow transitions-compared to silicon circuitry-between the different states during computation.

#### 171

##### ScalaBLAST 2.0: Rapid and robust BLAST calculations on multiprocessor systems

- OPEN
- Bioinformatics (Oxford, England)
- Published almost 7 years ago
- Discuss

MOTIVATION: BLAST remains one of the most widely used tools in computational biology. The rate at which new sequence data is available continues to grow exponentially, driving the emergence of new fields of biological research. At the same time multicore systems and conventional clusters are more accessible. ScalaBLAST has been designed to run on conventional multiprocessor systems with an eye to extreme parallelism, enabling parallel BLAST calculations using over 16,000 processing cores with a portable, robust, fault-resilient design that introduces little to no overhead with respect to serial BLAST. ScalaBLAST 2.0 source code can be freely downloaded from http://omics.pnl.gov/software/ScalaBLAST.php.

#### 169

##### Parallel markov chain Monte Carlo - bridging the gap to high-performance bayesian computation in animal breeding and genetics.

- OPEN
- Genetics, selection, evolution : GSE
- Published about 7 years ago
- Discuss

BACKGROUND: Most Bayesian models for the analysis of complex traits are not analytically tractable and inferences are based on computationally intensive techniques. This is true of Bayesian models for genome-enabled selection, which uses whole-genome molecular data to predict the genetic merit of candidate animals for breeding purposes. In this regard, parallel computing can overcome the bottlenecks that can arise from series computing. Hence, a major goal of the present study is to bridge the gap to high-performance Bayesian computation in the context of animal breeding and genetics. RESULTS: Parallel Monte Carlo Markov chain algorithms and strategies are described in the context of animal breeding and genetics. Parallel Monte Carlo algorithms are introduced as a starting point including their applications to computing single-parameter and certain multipleparameter models. Then, two basic approaches for parallel Markov chain Monte Carlo are described: one aims at parallelization within a single chain; the other is based on running multiple chains, yet some variants are discussed as well. Features and strategies of the parallel Markov chain Monte Carlo are illustrated using real data, including a large beef cattle dataset with 50K SNP genotypes. CONCLUSIONS: Parallel Markov chain Monte Carlo algorithms are useful for computing complex Bayesian models, which does not only lead to a dramatic speedup in computing but can also be used to optimize model parameters in complex Bayesian models. Hence, we anticipate that use of parallel Markov chain Monte Carlo will have a profound impact on revolutionizing the computational tools for genomic selection programs.

#### 168

##### Exploiting Parallel R in the Cloud with SPRINT

- OPEN
- Methods of information in medicine
- Published about 7 years ago
- Discuss

Background: Advances in DNA Microarray devices and next-generation massively parallel DNA sequencing platforms have led to an exponential growth in data availability but the arising opportunities require adequate computing resources. High Performance Computing (HPC) in the Cloud offers an affordable way of meeting this need.Objectives: Bioconductor, a popular tool for high-throughput genomic data analysis, is distributed as add-on modules for the R statistical programming language but R has no native capabilities for exploiting multi-processor architectures. SPRINT is an R package that enables easy access to HPC for genomics researchers. This paper investigates: setting up and running SPRINT-enabled genomic analyses on Amazon’s Elastic Compute Cloud (EC2), the advantages of submitting applications to EC2 from different parts of the world and, if resource underutilization can improve application performance.Methods: The SPRINT parallel implementations of correlation, permutation testing, partitioning around medoids and the multi-purpose papply have been benchmarked on data sets of various size on Amazon EC2. Jobs have been submitted from both the UK and Thailand to investigate monetary differences.Results: It is possible to obtain good, scalable performance but the level of improvement is dependent upon the nature of the algorithm. Resource underutilization can further improve the time to result. End-user’s location impacts on costs due to factors such as local taxation.Conclusions: Although not designed to satisfy HPC requirements, Amazon EC2 and cloud computing in general provides an interesting alternative and provides new possibilities for smaller organisations with limited funds.

#### 166

##### A novel CPU/GPU simulation environment for large-scale biologically realistic neural modeling

- OPEN
- Frontiers in neuroinformatics
- Published about 6 years ago
- Discuss

Computational Neuroscience is an emerging field that provides unique opportunities to study complex brain structures through realistic neural simulations. However, as biological details are added to models, the execution time for the simulation becomes longer. Graphics Processing Units (GPUs) are now being utilized to accelerate simulations due to their ability to perform computations in parallel. As such, they have shown significant improvement in execution time compared to Central Processing Units (CPUs). Most neural simulators utilize either multiple CPUs or a single GPU for better performance, but still show limitations in execution time when biological details are not sacrificed. Therefore, we present a novel CPU/GPU simulation environment for large-scale biological networks, the NeoCortical Simulator version 6 (NCS6). NCS6 is a free, open-source, parallelizable, and scalable simulator, designed to run on clusters of multiple machines, potentially with high performance computing devices in each of them. It has built-in leaky-integrate-and-fire (LIF) and Izhikevich (IZH) neuron models, but users also have the capability to design their own plug-in interface for different neuron types as desired. NCS6 is currently able to simulate one million cells and 100 million synapses in quasi real time by distributing data across eight machines with each having two video cards.

#### 41

##### The Cytoarchitecture of Domain-specific Regions in Human High-level Visual Cortex

- OPEN
- Cerebral cortex (New York, N.Y. : 1991)
- Published about 3 years ago
- Discuss

A fundamental hypothesis in neuroscience proposes that underlying cellular architecture (cytoarchitecture) contributes to the functionality of a brain area. However, this hypothesis has not been tested in human ventral temporal cortex (VTC) that contains domain-specific regions causally involved in perception. To fill this gap in knowledge, we used cortex-based alignment to register functional regions from living participants to cytoarchitectonic areas in ex vivo brains. This novel approach reveals 3 findings. First, there is a consistent relationship between domain-specific regions and cytoarchitectonic areas: each functional region is largely restricted to 1 cytoarchitectonic area. Second, extracting cytoarchitectonic profiles from face- and place-selective regions after back-projecting each region to 20-μm thick histological sections indicates that cytoarchitectonic properties distinguish these regions from each other. Third, some cytoarchitectonic areas contain more than 1 domain-specific region. For example, face-, body-, and character-selective regions are located within the same cytoarchitectonic area. We summarize these findings with a parsimonious hypothesis incorporating how cellular properties may contribute to functional specialization in human VTC. Specifically, we link computational principles to correlated axes of functional and cytoarchitectonic segregation in human VTC, in which parallel processing across domains occurs along a lateral-medial axis while transformations of information within domain occur along an anterior-posterior axis.

#### 28

##### Fast force field-based optimization of protein-ligand complexes with graphics processor

- Journal of computational chemistry
- Published over 7 years ago
- Discuss

Usually based on molecular mechanics force fields, the post-optimization of ligand poses is typically the most time-consuming step in protein-ligand docking procedures. In return, it bears the potential to overcome the limitations of discretized conformation models. Because of the parallel nature of the problem, recent graphics processing units (GPUs) can be applied to address this dilemma. We present a novel algorithmic approach for parallelizing and thus massively speeding up protein-ligand complex optimizations with GPUs. The method, customized to pose-optimization, performs at least 100 times faster than widely used CPU-based optimization tools. An improvement in Root-Mean-Square Distance (RMSD) compared to the original docking pose of up to 42% can be achieved. © 2012 Wiley Periodicals, Inc.

#### 28

##### A hierarchical exact accelerated stochastic simulation algorithm

- The Journal of chemical physics
- Published about 7 years ago
- Discuss

A new algorithm, “HiER-leap” (hierarchical exact reaction-leaping), is derived which improves on the computational properties of the ER-leap algorithm for exact accelerated simulation of stochastic chemical kinetics. Unlike ER-leap, HiER-leap utilizes a hierarchical or divide-and-conquer organization of reaction channels into tightly coupled “blocks” and is thereby able to speed up systems with many reaction channels. Like ER-leap, HiER-leap is based on the use of upper and lower bounds on the reaction propensities to define a rejection sampling algorithm with inexpensive early rejection and acceptance steps. But in HiER-leap, large portions of intra-block sampling may be done in parallel. An accept/reject step is used to synchronize across blocks. This method scales well when many reaction channels are present and has desirable asymptotic properties. The algorithm is exact, parallelizable and achieves a significant speedup over the stochastic simulation algorithm and ER-leap on certain problems. This algorithm offers a potentially important step towards efficient in silico modeling of entire organisms.

#### 28

##### Computing 2D Constrained Delaunay Triangulation Using the GPU.

- IEEE transactions on visualization and computer graphics
- Published about 7 years ago
- Discuss

We propose the first GPU solution to compute the 2D constrained Delaunay triangulation (CDT) of a planar straight line graph (PSLG) consisting of points and edges. There are many existing CPU algorithms to solve the CDT problem in computational geometry, yet there has been no prior approach to solve this problem efficiently using the parallel computing power of the GPU. For the special case of the CDT problem where the PSLG consists of just points, which is simply the normal Delaunay triangulation problem, a hybrid approach using the GPU together with the CPU to partially speed up the computation has already been presented in the literature. Our work, on the other hand, accelerates the entire computation on the GPU. Our implementation using the CUDA programming model on NVIDIA GPUs is numerically robust, and runs up to an order of magnitude faster than the best sequential implementations on the CPU. This result is reflected in our experiment with both randomly generated PSLGs and real-world GIS data having millions of points and edges.

#### 28

##### Comparison of GPU- and CPU-implementations of mean-firing rate neural networks on parallel hardware.

- Network (Bristol, England)
- Published about 7 years ago
- Discuss

Modern parallel hardware such as multi-core processors (CPUs) and graphics processing units (GPUs) have a high computational power which can be greatly beneficial to the simulation of large-scale neural networks. Over the past years, a number of efforts have focused on developing parallel algorithms and simulators best suited for the simulation of spiking neural models. In this article, we aim at investigating the advantages and drawbacks of the CPU and GPU parallelization of mean-firing rate neurons, widely used in systems-level computational neuroscience. By comparing OpenMP, CUDA and OpenCL implementations towards a serial CPU implementation, we show that GPUs are better suited than CPUs for the simulation of very large networks, but that smaller networks would benefit more from an OpenMP implementation. As this performance strongly depends on data organization, we analyze the impact of various factors such as data structure, memory alignment and floating precision. We then discuss the suitability of the different hardware depending on the networks' size and connectivity, as random or sparse connectivities in mean-firing rate networks tend to break parallel performance on GPUs due to the violation of coalescence.