SciCombinator

Discover the most talked about and latest scientific content & concepts.

Concept: Grid computing

168

Background: Advances in DNA Microarray devices and next-generation massively parallel DNA sequencing platforms have led to an exponential growth in data availability but the arising opportunities require adequate computing resources. High Performance Computing (HPC) in the Cloud offers an affordable way of meeting this need.Objectives: Bioconductor, a popular tool for high-throughput genomic data analysis, is distributed as add-on modules for the R statistical programming language but R has no native capabilities for exploiting multi-processor architectures. SPRINT is an R package that enables easy access to HPC for genomics researchers. This paper investigates: setting up and running SPRINT-enabled genomic analyses on Amazon’s Elastic Compute Cloud (EC2), the advantages of submitting applications to EC2 from different parts of the world and, if resource underutilization can improve application performance.Methods: The SPRINT parallel implementations of correlation, permutation testing, partitioning around medoids and the multi-purpose papply have been benchmarked on data sets of various size on Amazon EC2. Jobs have been submitted from both the UK and Thailand to investigate monetary differences.Results: It is possible to obtain good, scalable performance but the level of improvement is dependent upon the nature of the algorithm. Resource underutilization can further improve the time to result. End-user’s location impacts on costs due to factors such as local taxation.Conclusions: Although not designed to satisfy HPC requirements, Amazon EC2 and cloud computing in general provides an interesting alternative and provides new possibilities for smaller organisations with limited funds.

Concepts: Bioinformatics, Parallel computing, Computer, Amazon Web Services, Amazon.com, Grid computing, Amazon Elastic Compute Cloud, Rackspace Cloud

28

Bioinformatics is challenged by the fact that traditional analysis tools have difficulty in processing large-scale data from high-throughput sequencing. The open source Apache Hadoop project, which adopts the MapReduce framework and a distributed file system, has recently given bioinformatics researchers an opportunity to achieve scalable, efficient and reliable computing performance on Linux clusters and on cloud computing services. In this article, we present MapReduce frame-based applications that can be employed in the next-generation sequencing and other biological domains. In addition, we discuss the challenges faced by this field as well as the future works on parallel computing in bioinformatics.

Concepts: Future, Hadoop, Cloud computing, Grid computing, Linux, Google, File system, MapReduce

28

Securing robust cell adhesion between cells and biomaterials is one of key considerations for tissue engineering. However, the cell adhesion investigation by the biophysical effects such as topography or rigidity of substrates has only been recently reported. In this study, we examined the spatial property of focal adhesions by changing the height of micropatterns in two kinds of microtopography (grid and post) and the stiffness of the substrates. We found that the focal adhesion localization is highly regulated by topographical variation (height) of gird micropattens but not the rigidity of substrates or the function of actin cytoskeleton, although the latters strongly influence the focal adhesion size or area. In detail, the change of the height of the grid micropatterns results in the switching of focal adhesion sites; as the height increases, the localization of focal adhesion is switched from top to bottom areas. This study demonstrates that the localization of focal adhesion on well-defined micropatterned substrates is critically determined by the topographical variation in the micropatterns.

Concepts: Protein, Eukaryote, Cell biology, Actin, Cytoskeleton, Focal adhesion, Vinculin, Grid computing

16

The Zika virus outbreak in the Americas has caused global concern. To help accelerate this fight against Zika, we launched the OpenZika project. OpenZika is an IBM World Community Grid Project that uses distributed computing on millions of computers and Android devices to run docking experiments, in order to dock tens of millions of drug-like compounds against crystal structures and homology models of Zika proteins (and other related flavivirus targets). This will enable the identification of new candidates that can then be tested in vitro, to advance the discovery and development of new antiviral drugs against the Zika virus. The docking data is being made openly accessible so that all members of the global research community can use it to further advance drug discovery studies against Zika and other related flaviviruses.

Concepts: Protein structure, Bioinformatics, Virus, Antiviral drug, Influenza, Computer, Grid computing, Distributed computing

8

Understanding brain function requires monitoring and interpreting the activity of large networks of neurons during behavior. Advances in recording technology are greatly increasing the size and complexity of neural data. Analyzing such data will pose a fundamental bottleneck for neuroscience. We present a library of analytical tools called Thunder built on the open-source Apache Spark platform for large-scale distributed computing. The library implements a variety of univariate and multivariate analyses with a modular, extendable structure well-suited to interactive exploration and analysis development. We demonstrate how these analyses find structure in large-scale neural data, including whole-brain light-sheet imaging data from fictively behaving larval zebrafish, and two-photon imaging data from behaving mouse. The analyses relate neuronal responses to sensory input and behavior, run in minutes or less and can be used on a private cluster or in the cloud. Our open-source framework thus holds promise for turning brain activity mapping efforts into biological insights.

Concepts: Nervous system, Psychology, Neuron, Brain, Multivariate statistics, Retina, Analysis, Grid computing

6

Electrical interfacing with neural tissue is key to advancing diagnosis and therapies for neurological disorders, as well as providing detailed information about neural signals. A challenge for creating long-term stable interfaces between electronics and neural tissue is the huge mechanical mismatch between the systems. So far, materials and fabrication processes have restricted the development of soft electrode grids able to combine high performance, long-term stability, and high electrode density, aspects all essential for neural interfacing. Here, this challenge is addressed by developing a soft, high-density, stretchable electrode grid based on an inert, high-performance composite material comprising gold-coated titanium dioxide nanowires embedded in a silicone matrix. The developed grid can resolve high spatiotemporal neural signals from the surface of the cortex in freely moving rats with stable neural recording quality and preserved electrode signal coherence during 3 months of implantation. Due to its flexible and stretchable nature, it is possible to minimize the size of the craniotomy required for placement, further reducing the level of invasiveness. The material and device technology presented herein have potential for a wide range of emerging biomedical applications.

Concepts: Neuron, Carbon dioxide, Signal, Oxide, Solar cell, Composite material, Grid computing, High-performance computing

6

Energy grids are facing a relatively new paradigm consisting in the formation of local distributed energy sources and loads that can operate in parallel independently from the main power grid (usually called microgrids). One of the main challenges in microgrid-like networks management is that of self-adapting to the production and demands in a decentralized coordinated way. Here, we propose a stylized model that allows to analytically predict the coordination of the elements in the network, depending on the network topology. Surprisingly, almost global coordination is attained when users interact locally, with a small neighborhood, instead of the obvious but more costly all-to-all coordination. We compute analytically the optimal value of coordinated users in random homogeneous networks. The methodology proposed opens a new way of confronting the analysis of energy demand-side management in networked systems.

Concepts: The Network, Motor coordination, Computer network, Proposal, Grid computing, Network topology, Series and parallel circuits, Distributed generation

6

Next-generation sequencing can determine DNA bases and the results of sequence alignments are generally stored in files in the Sequence Alignment/Map (SAM) format and the compressed binary version (BAM) of it. SAMtools is a typical tool for dealing with files in the SAM/BAM format. SAMtools has various functions, including detection of variants, visualization of alignments, indexing, extraction of parts of the data and loci, and conversion of file formats. It is written in C and can execute fast. However, SAMtools requires an additional implementation to be used in parallel with, for example, OpenMP (Open Multi-Processing) libraries. For the accumulation of next-generation sequencing data, a simple parallelization program, which can support cloud and PC cluster environments, is required.

Concepts: DNA, Parallel computing, Computer program, Fibonacci number, Grid computing, File format, File system, Computer file

6

Selective Plane Illumination Microscopy (SPIM) allows to image developing organisms in 3D at unprecedented temporal resolution over long periods of time. The resulting massive amounts of raw image data requires extensive processing interactively via dedicated graphical user interface (GUI) applications. The consecutive processing steps can be easily automated and the individual time points can be processed independently, which lends itself to trivial parallelization on a high performance computing (HPC) cluster. Here we introduce an automated workflow for processing large multiview, multi-channel, multi-illumination time-lapse SPIM data on a single workstation or in parallel on a HPC cluster. The pipeline relies on snakemake to resolve dependencies among consecutive processing steps and can be easily adapted to any cluster environment for processing SPIM data in a fraction of the time required to collect it.

Concepts: Parallel computing, Computer, Unix, User interface, Personal computer, Graphical user interface, Grid computing, High-performance computing

5

Islanding is known as a management procedure of the power system that is implemented at the distribution level to preserve sensible loads from outages and to guarantee the continuity in electricity supply, when a high amount of distributed generation occurs. In this paper we study islanding on the level of the transmission grid and shall show that it is a suitable measure to enhance energy security and grid resilience. We consider the German and Italian transmission grids. We remove links either randomly to mimic random failure events, or according to a topological characteristic, their so-called betweenness centrality, to mimic an intentional attack and test whether the resulting fragments are self-sustainable. We test this option via the tool of optimized DC power flow equations. When transmission lines are removed according to their betweenness centrality, the resulting islands have a higher chance of being dynamically self-sustainable than for a random removal. Less connections may even increase the grid’s stability. These facts should be taken into account in the design of future power grids.

Concepts: Torque, Electrical engineering, Grid computing, Electric power transmission, Electricity distribution, Electricity market, Power outage, Distributed generation