Concept: Distributed computing
BACKGROUND: For shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed. RESULTS: We present a sequence database search engine that is specifically designed to run efficiently on the Hadoop MapReduce distributed computing framework. The search engine implements the K-score algorithm, generating comparable output for the same input files as the original implementation. The scalability of the system is shown, and the architecture required for the development of such distributed processing is discussed. CONCLUSION: The software is scalable in its ability to handle a large peptide database, numerous modifications and large numbers of spectra. Performance scales with the number of processors in the cluster, allowing throughput to expand with the available resources.
The Zika virus outbreak in the Americas has caused global concern. To help accelerate this fight against Zika, we launched the OpenZika project. OpenZika is an IBM World Community Grid Project that uses distributed computing on millions of computers and Android devices to run docking experiments, in order to dock tens of millions of drug-like compounds against crystal structures and homology models of Zika proteins (and other related flavivirus targets). This will enable the identification of new candidates that can then be tested in vitro, to advance the discovery and development of new antiviral drugs against the Zika virus. The docking data is being made openly accessible so that all members of the global research community can use it to further advance drug discovery studies against Zika and other related flaviviruses.
We introduce molecularevolution.org, a publicly available gateway for high-throughput, maximum likelihood phylogenetic analysis powered by grid computing. The gateway features a garli 2.0 web service that enables a user to quickly and easily submit thousands of maximum likelihood tree searches or bootstrap searches that are executed in parallel on distributed computing resources. The garli web service allows one to easily specify partitioned substitution models using a graphical interface, and it performs sophisticated post-processing of phylogenetic results. Although the garli web service has been used by the research community for over three years, here we formally announce the availability of the service, describe its capabilities, highlight new features and recent improvements, and provide details about how the grid system efficiently delivers high-quality phylogenetic results.
In this paper, a new non-orthogonal multiple-access scheme, trellis tone modulation multiple-access (TTMMA), is proposed for peer discovery of distributed device-to-device (D2D) communication. The range and capacity of discovery are important performance metrics in peer discovery. The proposed trellis tone modulation uses single-tone transmission and achieves a long discovery range due to its low Peak-to-Average Power Ratio (PAPR). The TTMMA also exploits non-orthogonal resource assignment to increase the discovery capacity. For the multi-user detection of superposed multiple-access signals, a message-passing algorithm with supplementary schemes are proposed. With TTMMA and its message-passing demodulation, approximately 1.5 times the number of devices are discovered compared to the conventional frequency division multiple-access (FDMA)-based discovery.
Healthcare provides many services such as diagnosing, treatment, prevention of diseases, illnesses, injuries, and other physical and mental disorders. Large-scale distributed data processing applications in healthcare as a basic concept operates on large amounts of data. Therefore, big data application functions are the main part of healthcare operations, but there was not any comprehensive and systematic survey about studying and evaluating the important techniques in this field. Therefore, this paper aims at providing the comprehensive, detailed, and systematic study of the state-of-the-art mechanisms in the big data related to healthcare applications in five categories, including machine learning, cloud-based, heuristic-based, agent-based, and hybrid mechanisms. Also, this paper displayed a systematic literature review (SLR) of the big data applications in the healthcare literature up to the end of 2016. Initially, 205 papers were identified, but a paper selection process reduced the number of papers to 29 important studies.
Unmanned underwater vehicles (UUVs) have rapidly developed as mobile sensor networks recently in the investigation, survey, and exploration of the underwater environment. The goal of this paper is to develop a practical and efficient formation control method to improve work efficiency of multi-UUV sensor networks. Distributed leader-follower formation controllers are designed based on a state feedback and consensus algorithm. Considering that each vehicle is subject to model uncertainties and current disturbances, a second-order integral UUV model with a nonlinear function is established using the state feedback linearized method under current disturbances. For unstable communication among UUVs, communication failure and acoustic link noise interference are considered. Two-layer random switching communication topologies are proposed to solve the problem of communication failure. For acoustic link noise interference, accurate representation of valid communication information and noise stripping when designing controllers is necessary. Effective communication topology weights are designed to represent the validity of communication information interfered by noise. Utilizing state feedback and noise stripping, sufficient conditions for design formation controllers are proposed to ensure UUV formation achieves consensus under model uncertainties, current disturbances, and unstable communication. The stability of formation controllers is proven by the Lyapunov-Razumikhin theorem, and the validity is verified by simulation results.
Many modern applications of AI such as web search, mobile browsing, image processing, and natural language processing rely on finding similar items from a large database of complex objects. Due to the very large scale of data involved (e.g., users' queries from commercial search engines), computing such near or nearest neighbors is a non-trivial task, as the computational cost grows significantly with the number of items. To address this challenge, we adopt Locality Sensitive Hashing (a.k.a, LSH) methods and evaluate four variants in a distributed computing environment (specifically, Hadoop). We identify several optimizations which improve performance, suitable for deployment in very large scale settings. The experimental results demonstrate our variants of LSH achieve the robust performance with better recall compared with “vanilla” LSH, even when using the same amount of space.
- Proceedings of the National Academy of Sciences of the United States of America
- Published about 1 year ago
Understanding how biochemical networks lead to large-scale nonequilibrium self-organization and pattern formation in life is a major challenge, with important implications for the design of programmable synthetic systems. Here, we assembled cell-free genetic oscillators in a spatially distributed system of on-chip DNA compartments as artificial cells, and measured reaction-diffusion dynamics at the single-cell level up to the multicell scale. Using a cell-free gene network we programmed molecular interactions that control the frequency of oscillations, population variability, and dynamical stability. We observed frequency entrainment, synchronized oscillatory reactions and pattern formation in space, as manifestation of collective behavior. The transition to synchrony occurs as the local coupling between compartments strengthens. Spatiotemporal oscillations are induced either by a concentration gradient of a diffusible signal, or by spontaneous symmetry breaking close to a transition from oscillatory to nonoscillatory dynamics. This work offers design principles for programmable biochemical reactions with potential applications to autonomous sensing, distributed computing, and biomedical diagnostics.
This paper introduces a novel extension of the edge-based compartmental model to epidemics where the transmission and recovery processes are driven by general independent probability distributions. Edge-based compartmental modelling is just one of many different approaches used to model the spread of an infectious disease on a network; the major result of this paper is the rigorous proof that the edge-based compartmental model and the message passing models are equivalent for general independent transmission and recovery processes. This implies that the new model is exact on the ensemble of configuration model networks of infinite size. For the case of Markovian transmission the message passing model is re-parametrised into a pairwise-like model which is then used to derive many well-known pairwise models for regular networks, or when the infectious period is exponentially distributed or is of a fixed length.
This paper considers a consensus problem of a class of second-order multi-agent systems with a moving mode and multiple delays on directed graphs. Using local information, a distributed algorithm is adopted to make all agents reach a consensus while moving together with a constant velocity in the presence of delays. To study the effects of the coexistence of the moving mode and delays on the consensus convergence, a frequency domain approach is employed through analyzing the relationship between the components of the eigenvector associated with the eigenvalue on imaginary axis. Then based on the continuity of the system function, an upper bound for the delays is given to ensure the consensus convergence of the system. A numerical example is included to illustrate the obtained theoretical results.