Concept: Computer science
Women comprise a minority of the Science, Technology, Engineering, Mathematics, and Medicine (STEMM) workforce. Quantifying the gender gap may identify fields that will not reach parity without intervention, reveal underappreciated biases, and inform benchmarks for gender balance among conference speakers, editors, and hiring committees. Using the PubMed and arXiv databases, we estimated the gender of 36 million authors from >100 countries publishing in >6000 journals, covering most STEMM disciplines over the last 15 years, and made a web app allowing easy access to the data (https://lukeholman.github.io/genderGap/). Despite recent progress, the gender gap appears likely to persist for generations, particularly in surgery, computer science, physics, and maths. The gap is especially large in authorship positions associated with seniority, and prestigious journals have fewer women authors. Additionally, we estimate that men are invited by journals to submit papers at approximately double the rate of women. Wealthy countries, notably Japan, Germany, and Switzerland, had fewer women authors than poorer ones. We conclude that the STEMM gender gap will not close without further reforms in education, mentoring, and academic publishing.
Recently, we proposed that Brainets, i.e. networks formed by multiple animal brains, cooperating and exchanging information in real time through direct brain-to-brain interfaces, could provide the core of a new type of computing device: an organic computer. Here, we describe the first experimental demonstration of such a Brainet, built by interconnecting four adult rat brains. Brainets worked by concurrently recording the extracellular electrical activity generated by populations of cortical neurons distributed across multiple rats chronically implanted with multi-electrode arrays. Cortical neuronal activity was recorded and analyzed in real time, and then delivered to the somatosensory cortices of other animals that participated in the Brainet using intracortical microstimulation (ICMS). Using this approach, different Brainet architectures solved a number of useful computational problems, such as discrete classification, image processing, storage and retrieval of tactile information, and even weather forecasting. Brainets consistently performed at the same or higher levels than single rats in these tasks. Based on these findings, we propose that Brainets could be used to investigate animal social behaviors as well as a test bed for exploring the properties and potential applications of organic computers.
We describe a set of best practices for scientific software development, based on research and experience, that will improve scientists' productivity and the reliability of their software.
Driven by advances in materials and computer science, researchers are attempting to design systems where the computer and material are one and the same entity. Using theoretical and computational modeling, we design a hybrid material system that can autonomously transduce chemical, mechanical, and electrical energy to perform a computational task in a self-organized manner, without the need for external electrical power sources. Each unit in this system integrates a self-oscillating gel, which undergoes the Belousov-Zhabotinsky (BZ) reaction, with an overlaying piezoelectric (PZ) cantilever. The chemomechanical oscillations of the BZ gels deflect the PZ layer, which consequently generates a voltage across the material. When these BZ-PZ units are connected in series by electrical wires, the oscillations of these units become synchronized across the network, where the mode of synchronization depends on the polarity of the PZ. We show that the network of coupled, synchronizing BZ-PZ oscillators can perform pattern recognition. The “stored” patterns are set of polarities of the individual BZ-PZ units, and the “input” patterns are coded through the initial phase of the oscillations imposed on these units. The results of the modeling show that the input pattern closest to the stored pattern exhibits the fastest convergence time to stable synchronization behavior. In this way, networks of coupled BZ-PZ oscillators achieve pattern recognition. Further, we show that the convergence time to stable synchronization provides a robust measure of the degree of match between the input and stored patterns. Through these studies, we establish experimentally realizable design rules for creating “materials that compute.”
In the 1940s, the first generation of modern computers used vacuum tube oscillators as their principle components, however, with the development of the transistor, such oscillator based computers quickly became obsolete. As the demand for faster and lower power computers continues, transistors are themselves approaching their theoretical limit and emerging technologies must eventually supersede them. With the development of optical oscillators and Josephson junction technology, we are again presented with the possibility of using oscillators as the basic components of computers, and it is possible that the next generation of computers will be composed almost entirely of oscillatory devices. Here, we demonstrate how coupled threshold oscillators may be used to perform binary logic in a manner entirely consistent with modern computer architectures. We describe a variety of computational circuitry and demonstrate working oscillator models of both computation and memory.
BACKGROUND: For shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed. RESULTS: We present a sequence database search engine that is specifically designed to run efficiently on the Hadoop MapReduce distributed computing framework. The search engine implements the K-score algorithm, generating comparable output for the same input files as the original implementation. The scalability of the system is shown, and the architecture required for the development of such distributed processing is discussed. CONCLUSION: The software is scalable in its ability to handle a large peptide database, numerous modifications and large numbers of spectra. Performance scales with the number of processors in the cluster, allowing throughput to expand with the available resources.
We have developed Cake, a bioinformatics software pipeline that integrates four publicly available somatic variant-calling algorithms to identify single nucleotide variants with higher sensitivity and accuracy than any one algorithm alone. Cake can be run on a high-performance computer cluster or used as a standalone application.
Background Genome-wide association studies have become very popular in identifyinggenetic contributions to phenotypes. Millions of SNPs are being tested fortheir association with diseases and traits using linear or logistic regression models.This conceptually simple strategy encounters the following computational issues: a largenumber of tests and very large genotype files (many Gigabytes) which cannot bedirectly loaded into the software memory. One of the solutions applied on agrand scale is cluster computing involving large-scale resources.We show how to speed up the computations using matrix operations in pure R code.Results We improve speed: computation time from 6 hours is reduced to 10-15 minutes.Our approach can handle essentially an unlimited amount of covariates efficiently, using projections. Data files in GWAS are vast and reading them intocomputer memory becomes an important issue. However, much improvement can bemade if the data is structured beforehand in a way allowing for easy access to blocks ofSNPs. We propose several solutions based on the R packages ff and ncdf.We adapted the semi-parallel computations for logistic regression.We show that in a typical GWAS setting, where SNP effects are very small, we do not lose any precision and our computations are few hundreds times faster than standard procedures.Conclusions We provide very fast algorithms for GWAS written in pure R code. We also showhow to rearrange SNP data for fast access.
Spike pattern classification is a key topic in machine learning, computational neuroscience, and electronic device design. Here, we offer a new supervised learning rule based on Support Vector Machines (SVM) to determine the synaptic weights of a leaky integrate-and-fire (LIF) neuron model for spike pattern classification. We compare classification performance between this algorithm and other methods sharing the same conceptual framework. We consider the effect of postsynaptic potential (PSP) kernel dynamics on patterns separability, and we propose an extension of the method to decrease computational load. The algorithm performs well in generalization tasks. We show that the peak value of spike patterns separability depends on a relation between PSP dynamics and spike pattern duration, and we propose a particular kernel that is well-suited for fast computations and electronic implementations.
BACKGROUND: Global network alignment has been proposed as an effective tool for computing functional orthology. Commonly used global alignment techniques such as IsoRank rely on a two-step process: the first step is an iterative diffusion-based approach for assigning similarity scores to all possible node pairs (matchings); the second step applies a maximum-weight bipartite matching algorithm to this similarity score matrix to identify orthologous node pairs. While demonstrably successful in identifying orthologies beyond those based on sequences, this two-step process is computationally expensive. Recent work on computation of node-pair similarity matrices has demonstrated that the computational cost of the first step can be significantly reduced. The use of these accelerated methods renders the bipartite matching step as the dominant computational cost. This motivates a critical assessment of the tradeoffs of computational cost and solution quality (matching quality, topological matches, and biological significance) associated with the bipartite matching step. In this paper we utilize the state-of-the-art core diffusion-based step in IsoRank for similarity matrix computation, and couple it with two heuristic bipartite matching algorithms - a matrix-based greedy approach, and a tunable, adaptive, auction-based matching algorithm developed by us. We then compare our implementations against the performance and quality characteristics of the solution produced by the reference IsoRank binary, which also implements an optimal matching algorithm. RESULTS: Using heuristic matching algorithms in the IsoRank pipeline exhibits dramatic speedup improvements; typically x30 times faster for the total alignment process in most cases of interest. More surprisingly, these improvements in compute times are typically accompanied by better or comparable topological and biological quality for the network alignments generated. These measures are quantified by the number of conserved edges in the alignment graph, the percentage of enriched components, and the total number of covered Gene Ontology (GO) terms. CONCLUSIONS: We have demonstrated significant reductions in global network alignment computation times by coupling heuristic bipartite matching methods with the similarity scoring step of the IsoRank procedure. Our heuristic matching techniques maintain comparable - if not better - quality in resulting alignments. A consequence of our work is that network-alignment based orthologies can be computed within minutes (as compared to hours) on typical protein interaction networks, enabling a more comprehensive tuning of alignment parameters for refined orthologies.