Concept: Computational complexity theory
- Journal of the Royal Society, Interface / the Royal Society
- Published over 2 years ago
Several recent studies hint at shared patterns in decision-making between taxonomically distant organisms, yet few studies demonstrate and dissect mechanisms of decision-making in simpler organisms. We examine decision-making in the unicellular slime mould Physarum polycephalum using a classical decision problem adapted from human and animal decision-making studies: the two-armed bandit problem. This problem has previously only been used to study organisms with brains, yet here we demonstrate that a brainless unicellular organism compares the relative qualities of multiple options, integrates over repeated samplings to perform well in random environments, and combines information on reward frequency and magnitude in order to make correct and adaptive decisions. We extend our inquiry by using Bayesian model selection to determine the most likely algorithm used by the cell when making decisions. We deduce that this algorithm centres around a tendency to exploit environments in proportion to their reward experienced through past sampling. The algorithm is intermediate in computational complexity between simple, reactionary heuristics and calculation-intensive optimal performance algorithms, yet it has very good relative performance. Our study provides insight into ancestral mechanisms of decision-making and suggests that fundamental principles of decision-making, information processing and even cognition are shared among diverse biological systems.
Redox-based resistive switching random access memory (ReRAM) offers excellent properties to implement future non-volatile memory arrays. Recently, the capability of two-state ReRAMs to implement Boolean logic functionality gained wide interest. Here, we report on seven-states Tantalum Oxide Devices, which enable the realization of an intrinsic modular arithmetic using a ternary number system. Modular arithmetic, a fundamental system for operating on numbers within the limit of a modulus, is known to mathematicians since the days of Euclid and finds applications in diverse areas ranging from e-commerce to musical notations. We demonstrate that multistate devices not only reduce the storage area consumption drastically, but also enable novel in-memory operations, such as computing using high-radix number systems, which could not be implemented using two-state devices. The use of high radix number system reduces the computational complexity by reducing the number of needed digits. Thus the number of calculation operations in an addition and the number of logic devices can be reduced.
The maximum clique enumeration (MCE) problem asks that we identify all maximum cliques in a finite, simple graph. MCE is closely related to two other well-known and widely-studied problems: the maximum clique optimization problem, which asks us to determine the size of a largest clique, and the maximal clique enumeration problem, which asks that we compile a listing of all maximal cliques. Naturally, these three problems are NP-hard, given that they subsume the classic version of the NP-complete clique decision problem. MCE can be solved in principle with standard enumeration methods due to Bron, Kerbosch, Kose and others. Unfortunately, these techniques are ill-suited to graphs encountered in our applications. We must solve MCE on instances deeply seeded in data mining and computational biology, where high-throughput data capture often creates graphs of extreme size and density. MCE can also be solved in principle using more modern algorithms based in part on vertex cover and the theory of fixed-parameter tractability (FPT). While FPT is an improvement, these algorithms too can fail to scale sufficiently well as the sizes and densities of our datasets grow.
Background Genome-wide association studies have become very popular in identifyinggenetic contributions to phenotypes. Millions of SNPs are being tested fortheir association with diseases and traits using linear or logistic regression models.This conceptually simple strategy encounters the following computational issues: a largenumber of tests and very large genotype files (many Gigabytes) which cannot bedirectly loaded into the software memory. One of the solutions applied on agrand scale is cluster computing involving large-scale resources.We show how to speed up the computations using matrix operations in pure R code.Results We improve speed: computation time from 6 hours is reduced to 10-15 minutes.Our approach can handle essentially an unlimited amount of covariates efficiently, using projections. Data files in GWAS are vast and reading them intocomputer memory becomes an important issue. However, much improvement can bemade if the data is structured beforehand in a way allowing for easy access to blocks ofSNPs. We propose several solutions based on the R packages ff and ncdf.We adapted the semi-parallel computations for logistic regression.We show that in a typical GWAS setting, where SNP effects are very small, we do not lose any precision and our computations are few hundreds times faster than standard procedures.Conclusions We provide very fast algorithms for GWAS written in pure R code. We also showhow to rearrange SNP data for fast access.
A number of centrality measures are available to determine the relative importance of a node in a complex network, and betweenness is prominent among them. However, the existing centrality measures are not adequate in network percolation scenarios (such as during infection transmission in a social network of individuals, spreading of computer viruses on computer networks, or transmission of disease over a network of towns) because they do not account for the changing percolation states of individual nodes. We propose a new measure, percolation centrality, that quantifies relative impact of nodes based on their topological connectivity, as well as their percolation states. The measure can be extended to include random walk based definitions, and its computational complexity is shown to be of the same order as that of betweenness centrality. We demonstrate the usage of percolation centrality by applying it to a canonical network as well as simulated and real world scale-free and random networks.
Traditional k-means and most k-means variants are still computationally expensive for large datasets, such as microarray data, which have large datasets with large dimension size d. In k-means clustering, we are given a set of n data points in d-dimensional space R(d) and an integer k. The problem is to determine a set of k points in R(d), called centers, so as to minimize the mean squared distance from each data point to its nearest center. In this work, we develop a novel k-means algorithm, which is simple but more efficient than the traditional k-means and the recent enhanced k-means. Our new algorithm is based on the recently established relationship between principal component analysis and the k-means clustering. We provided the correctness proof for this algorithm. Results obtained from testing the algorithm on three biological data and six non-biological data (three of these data are real, while the other three are simulated) also indicate that our algorithm is empirically faster than other known k-means algorithms. We assessed the quality of our algorithm clusters against the clusters of a known structure using the Hubert-Arabie Adjusted Rand index (ARI(HA)). We found that when k is close to d, the quality is good (ARI(HA)>0.8) and when k is not close to d, the quality of our new k-means algorithm is excellent (ARI(HA)>0.9). In this paper, emphases are on the reduction of the time requirement of the k-means algorithm and its application to microarray data due to the desire to create a tool for clustering and malaria research. However, the new clustering algorithm can be used for other clustering needs as long as an appropriate measure of distance between the centroids and the members is used. This has been demonstrated in this work on six non-biological data.
BACKGROUND: Global network alignment has been proposed as an effective tool for computing functional orthology. Commonly used global alignment techniques such as IsoRank rely on a two-step process: the first step is an iterative diffusion-based approach for assigning similarity scores to all possible node pairs (matchings); the second step applies a maximum-weight bipartite matching algorithm to this similarity score matrix to identify orthologous node pairs. While demonstrably successful in identifying orthologies beyond those based on sequences, this two-step process is computationally expensive. Recent work on computation of node-pair similarity matrices has demonstrated that the computational cost of the first step can be significantly reduced. The use of these accelerated methods renders the bipartite matching step as the dominant computational cost. This motivates a critical assessment of the tradeoffs of computational cost and solution quality (matching quality, topological matches, and biological significance) associated with the bipartite matching step. In this paper we utilize the state-of-the-art core diffusion-based step in IsoRank for similarity matrix computation, and couple it with two heuristic bipartite matching algorithms - a matrix-based greedy approach, and a tunable, adaptive, auction-based matching algorithm developed by us. We then compare our implementations against the performance and quality characteristics of the solution produced by the reference IsoRank binary, which also implements an optimal matching algorithm. RESULTS: Using heuristic matching algorithms in the IsoRank pipeline exhibits dramatic speedup improvements; typically x30 times faster for the total alignment process in most cases of interest. More surprisingly, these improvements in compute times are typically accompanied by better or comparable topological and biological quality for the network alignments generated. These measures are quantified by the number of conserved edges in the alignment graph, the percentage of enriched components, and the total number of covered Gene Ontology (GO) terms. CONCLUSIONS: We have demonstrated significant reductions in global network alignment computation times by coupling heuristic bipartite matching methods with the similarity scoring step of the IsoRank procedure. Our heuristic matching techniques maintain comparable - if not better - quality in resulting alignments. A consequence of our work is that network-alignment based orthologies can be computed within minutes (as compared to hours) on typical protein interaction networks, enabling a more comprehensive tuning of alignment parameters for refined orthologies.
The performance of conventional minutiae-based fingerprint authentication algorithms degrades significantly when dealing with low quality fingerprints with lots of cuts or scratches. A similar degradation of the minutiae-based algorithms is observed when small overlapping areas appear because of the quite narrow width of the sensors. Based on the detection of minutiae, Scale Invariant Feature Transformation (SIFT) descriptors are employed to fulfill verification tasks in the above difficult scenarios. However, the original SIFT algorithm is not suitable for fingerprint because of: (1) the similar patterns of parallel ridges; and (2) high computational resource consumption. To enhance the efficiency and effectiveness of the algorithm for fingerprint verification, we propose a SIFT-based Minutia Descriptor (SMD) to improve the SIFT algorithm through image processing, descriptor extraction and matcher. A two-step fast matcher, named improved All Descriptor-Pair Matching (iADM), is also proposed to implement the 1:N verifications in real-time. Fingerprint Identification using SMD and iADM (FISiA) achieved a significant improvement with respect to accuracy in representative databases compared with the conventional minutiae-based method. The speed of FISiA also can meet real-time requirements.
Indoor positioning systems based on the fingerprint method are widely used due to the large number of existing devices with a wide range of coverage. However, extensive positioning regions with a massive fingerprint database may cause high computational complexity and error margins, therefore clustering methods are widely applied as a solution. However, traditional clustering methods in positioning systems can only measure the similarity of the Received Signal Strength without being concerned with the continuity of physical coordinates. Besides, outage of access points could result in asymmetric matching problems which severely affect the fine positioning procedure. To solve these issues, in this paper we propose a positioning system based on the Spatial Division Clustering (SDC) method for clustering the fingerprint dataset subject to physical distance constraints. With the Genetic Algorithm and Support Vector Machine techniques, SDC can achieve higher coarse positioning accuracy than traditional clustering algorithms. In terms of fine localization, based on the Kernel Principal Component Analysis method, the proposed positioning system outperforms its counterparts based on other feature extraction methods in low dimensionality. Apart from balancing online matching computational burden, the new positioning system exhibits advantageous performance on radio map clustering, and also shows better robustness and adaptability in the asymmetric matching problem aspect.
The integrated navigation system with strapdown inertial navigation system (SINS), Beidou (BD) receiver and Doppler velocity log (DVL) can be used in marine applications owing to the fact that the redundant and complementary information from different sensors can markedly improve the system accuracy. However, the existence of multisensor asynchrony will introduce errors into the system. In order to deal with the problem, conventionally the sampling interval is subdivided, which increases the computational complexity. In this paper, an innovative integrated navigation algorithm based on a Cubature Kalman filter (CKF) is proposed correspondingly. A nonlinear system model and observation model for the SINS/BD/DVL integrated system are established to more accurately describe the system. By taking multi-sensor asynchronization into account, a new sampling principle is proposed to make the best use of each sensor’s information. Further, CKF is introduced in this new algorithm to enable the improvement of the filtering accuracy. The performance of this new algorithm has been examined through numerical simulations. The results have shown that the positional error can be effectively reduced with the new integrated navigation algorithm. Compared with the traditional algorithm based on EKF, the accuracy of the SINS/BD/DVL integrated navigation system is improved, making the proposed nonlinear integrated navigation algorithm feasible and efficient.