Concept: Gradient descent
Objective. Sensorimotor rhythms (SMRs) are 8-30 Hz oscillations in the electroencephalogram (EEG) recorded from the scalp over sensorimotor cortex that change with movement and/or movement imagery. Many brain-computer interface (BCI) studies have shown that people can learn to control SMR amplitudes and can use that control to move cursors and other objects in one, two or three dimensions. At the same time, if SMR-based BCIs are to be useful for people with neuromuscular disabilities, their accuracy and reliability must be improved substantially. These BCIs often use spatial filtering methods such as common average reference (CAR), Laplacian (LAP) filter or common spatial pattern (CSP) filter to enhance the signal-to-noise ratio of EEG. Here, we test the hypothesis that a new filter design, called an ‘adaptive Laplacian (ALAP) filter’, can provide better performance for SMR-based BCIs. Approach. An ALAP filter employs a Gaussian kernel to construct a smooth spatial gradient of channel weights and then simultaneously seeks the optimal kernel radius of this spatial filter and the regularization parameter of linear ridge regression. This optimization is based on minimizing the leave-one-out cross-validation error through a gradient descent method and is computationally feasible. Main results. Using a variety of kinds of BCI data from a total of 22 individuals, we compare the performances of ALAP filter to CAR, small LAP, large LAP and CSP filters. With a large number of channels and limited data, ALAP performs significantly better than CSP, CAR, small LAP and large LAP both in classification accuracy and in mean-squared error. Using fewer channels restricted to motor areas, ALAP is still superior to CAR, small LAP and large LAP, but equally matched to CSP. Significance. Thus, ALAP may help to improve the accuracy and robustness of SMR-based BCIs.
- IEEE transactions on pattern analysis and machine intelligence
- Published almost 3 years ago
Hashing has attracted a great deal of research in recent years due to its effectiveness for the retrieval and indexing of large-scale high-dimensional multimedia data. In this paper, we propose a novel ranking-based hashing framework that maps data from different modalities into a common Hamming space where the cross-modal similarity can be measured using Hamming distance. Unlike existing cross-modal hashing algorithms where the learned hash functions are binary space partitioning functions, such as the sign and threshold function, the proposed hashing scheme takes advantage of a new class of hash functions closely related to rank correlation measures which are known to be scale-invariant, numerically stable, and highly nonlinear. Specifically, we jointly learn two groups of linear subspaces, one for each modality, so that features' ranking orders in different linear subspaces maximally preserve the cross-modal similarities. We show that the ranking-based hash function has a natural probabilistic approximation which transforms the original highly discontinuous optimization problem into one that can be efficiently solved using simple gradient descent algorithms. The proposed hashing framework is also flexible in the sense that the optimization procedures are not tied up to any specific form of loss function, which is typical for existing cross-modal hashing methods, but rather we can flexibly accommodate different loss functions with minimal changes to the learning steps. We demonstrate through extensive experiments on four widely-used real-world multimodal datasets that the proposed cross-modal hashing method can achieve competitive performance against several state-of-the-arts with only moderate training and testing time.
Trial-and-error learning requires evaluating variable actions and reinforcing successful variants. In songbirds, vocal exploration is induced by LMAN, the output of a basal ganglia-related circuit that also contributes a corrective bias to the vocal output. This bias is gradually consolidated in RA, a motor cortex analogue downstream of LMAN. We develop a new model of such two-stage learning. Using stochastic gradient descent, we derive how the activity in ‘tutor’ circuits (e.g., LMAN) should match plasticity mechanisms in ‘student’ circuits (e.g., RA) to achieve efficient learning. We further describe a reinforcement learning framework through which the tutor can build its teaching signal. We show that mismatches between the tutor signal and the plasticity mechanism can impair learning. Applied to birdsong, our results predict the temporal structure of the corrective bias from LMAN given a plasticity rule in RA. Our framework can be applied predictively to other paired brain areas showing two-stage learning.
Unreasonable effectiveness of learning neural networks: From accessible states and robust ensembles to basic algorithmic schemes
- Proceedings of the National Academy of Sciences of the United States of America
- Published over 2 years ago
In artificial neural networks, learning from data is a computationally demanding task in which a large number of connection weights are iteratively tuned through stochastic-gradient-based heuristic processes over a cost function. It is not well understood how learning occurs in these systems, in particular how they avoid getting trapped in configurations with poor computational performance. Here, we study the difficult case of networks with discrete weights, where the optimization landscape is very rough even for simple architectures, and provide theoretical and numerical evidence of the existence of rare-but extremely dense and accessible-regions of configurations in the network weight space. We define a measure, the robust ensemble (RE), which suppresses trapping by isolated configurations and amplifies the role of these dense regions. We analytically compute the RE in some exactly solvable models and also provide a general algorithmic scheme that is straightforward to implement: define a cost function given by a sum of a finite number of replicas of the original cost function, with a constraint centering the replicas around a driving assignment. To illustrate this, we derive several powerful algorithms, ranging from Markov Chains to message passing to gradient descent processes, where the algorithms target the robust dense states, resulting in substantial improvements in performance. The weak dependence on the number of precision bits of the weights leads us to conjecture that very similar reasoning applies to more conventional neural networks. Analogous algorithmic schemes can also be applied to other optimization problems.
Single-particle electron cryomicroscopy (cryo-EM) is a powerful method for determining the structures of biological macromolecules. With automated microscopes, cryo-EM data can often be obtained in a few days. However, processing cryo-EM image data to reveal heterogeneity in the protein structure and to refine 3D maps to high resolution frequently becomes a severe bottleneck, requiring expert intervention, prior structural knowledge, and weeks of calculations on expensive computer clusters. Here we show that stochastic gradient descent (SGD) and branch-and-bound maximum likelihood optimization algorithms permit the major steps in cryo-EM structure determination to be performed in hours or minutes on an inexpensive desktop computer. Furthermore, SGD with Bayesian marginalization allows ab initio 3D classification, enabling automated analysis and discovery of unexpected structures without bias from a reference map. These algorithms are combined in a user-friendly computer program named cryoSPARC (http://www.cryosparc.com).
Controlling the flow and routing of data is a fundamental problem in many distributed networks, including transportation systems, integrated circuits, and the Internet. In the brain, synaptic plasticity rules have been discovered that regulate network activity in response to environmental inputs, which enable circuits to be stable yet flexible. Here, we develop a new neuro-inspired model for network flow control that depends only on modifying edge weights in an activity-dependent manner. We show how two fundamental plasticity rules, long-term potentiation and long-term depression, can be cast as a distributed gradient descent algorithm for regulating traffic flow in engineered networks. We then characterize, both by simulation and analytically, how different forms of edge-weight-update rules affect network routing efficiency and robustness. We find a close correspondence between certain classes of synaptic weight-update rules derived experimentally in the brain and rules commonly used in engineering, suggesting common principles to both.
Chromosomes are not positioned randomly within a nucleus, but instead, they adopt preferred spatial conformations to facilitate necessary long-range gene-gene interactions and regulations. Thus, obtaining the 3D shape of chromosomes of a genome is critical for understanding how the genome folds, functions and how its genes interact and are regulated. Here, we describe a method to reconstruct preferred 3D structures of individual chromosomes of the human genome from chromosomal contact data generated by the Hi-C chromosome conformation capturing technique. A novel parameterized objective function was designed for modeling chromosome structures, which was optimized by a gradient descent method to generate chromosomal structural models that could satisfy as many intra-chromosomal contacts as possible. We applied the objective function and the corresponding optimization method to two Hi-C chromosomal data sets of both a healthy and a cancerous human B-cell to construct 3D models of individual chromosomes at resolutions of 1 MB and 200 KB, respectively. The parameters used with the method were calibrated according to an independent fluorescence in situ hybridization experimental data. The structural models generated by our method could satisfy a high percentage of contacts (pairs of loci in interaction) and non-contacts (pairs of loci not in interaction) and were compatible with the known two-compartment organization of human chromatin structures. Furthermore, structural models generated at different resolutions and from randomly permuted data sets were consistent.
Accuracy Maximization Analysis (AMA) is a recently developed Bayesian ideal observer method for task-specific dimensionality reduction. Given a training set of proximal stimuli (e.g. retinal images), a response noise model, and a cost function, AMA returns the filters (i.e. receptive fields) that extract the most useful stimulus features for estimating a user-specified latent variable from those stimuli. Here, we first contribute two technical advances that significantly reduce AMA’s compute time: we derive gradients of cost functions for which two popular estimators are appropriate, and we implement a stochastic gradient descent (AMA-SGD) routine for filter learning. Next, we show how the method can be used to simultaneously probe the impact on neural encoding of natural stimulus variability, the prior over the latent variable, noise power, and the choice of cost function. Then, we examine the geometry of AMA’s unique combination of properties that distinguish it from better-known statistical methods. Using binocular disparity estimation as a concrete test case, we develop insights that have general implications for understanding neural encoding and decoding in a broad class of fundamental sensory-perceptual tasks connected to the energy model. Specifically, we find that non-orthogonal (partially redundant) filters with scaled additive noise tend to outperform orthogonal filters with constant additive noise; non-orthogonal filters and scaled additive noise can interact to sculpt noise-induced stimulus encoding uncertainty to match task-irrelevant stimulus variability. Thus, we show that some properties of neural response thought to be biophysical nuisances can confer coding advantages to neural systems. Finally, we speculate that, if repurposed for the problem of neural systems identification, AMA may be able to overcome a fundamental limitation of standard subunit model estimation. As natural stimuli become more widely used in the study of psychophysical and neurophysiological performance, we expect that task-specific methods for feature learning like AMA will become increasingly important.
Focusing on the inverse synthetic aperture radar (ISAR) imaging of maneuvering targets, this paper presents a new imaging method which works well when the target’s maneuvering is not too severe. After translational motion compensation, we describe the equivalent rotation of maneuvering targets by two variables-the relative chirp rate of the linear frequency modulated (LFM) signal and the Doppler focus shift. The first variable indicates the target’s motion status, and the second one represents the possible residual error of the translational motion compensation. With them, a modified Fourier transform matrix is constructed and then used for cross-range compression. Consequently, the imaging of maneuvering is converted into a two-dimensional parameter optimization problem in which a stable and clear ISAR image is guaranteed. A gradient descent optimization scheme is employed to obtain the accurate relative chirp rate and Doppler focus shift. Moreover, we designed an efficient and robust initialization process for the gradient descent method, thus, the well-focused ISAR images of maneuvering targets can be achieved adaptively. Human intervention is not needed, and it is quite convenient for practical ISAR imaging systems. Compared to precedent imaging methods, the new method achieves better imaging quality under reasonable computational cost. Simulation results are provided to validate the effectiveness and advantages of the proposed method.
- Journal of magnetic resonance (San Diego, Calif. : 1997)
- Published over 1 year ago
A method is proposed for optimizing the performance of the APSOC (Adiabatic-Passage Spin Order Conversion) technique, which can be exploited in NMR experiments with singlet spin states. In this technique magnetization-to-singlet conversion (and singlet-to-magnetization conversion) is performed by using adiabatically ramped RF-fields. Optimization utilizes the GRAPE (Gradient Ascent Pulse Engineering) approach, in which for a fixed search area we assume monotonicity to the envelope of the RF-field. Such an approach allows one to achieve much better performance for APSOC; consequently, the efficiency of magnetization-to-singlet conversion is greatly improved as compared to simple model RF-ramps, e.g., linear ramps. We also demonstrate that the optimization method is reasonably robust to possible inaccuracies in determining NMR parameters of the spin system under study and also in setting the RF-field parameters. The present approach can be exploited in other NMR and EPR applications using adiabatic switching of spin Hamiltonians.