Concept: Support vector machine
Despite partial success, communication has remained impossible for persons suffering from complete motor paralysis but intact cognitive and emotional processing, a state called complete locked-in state (CLIS). Based on a motor learning theoretical context and on the failure of neuroelectric brain-computer interface (BCI) communication attempts in CLIS, we here report BCI communication using functional near-infrared spectroscopy (fNIRS) and an implicit attentional processing procedure. Four patients suffering from advanced amyotrophic lateral sclerosis (ALS)-two of them in permanent CLIS and two entering the CLIS without reliable means of communication-learned to answer personal questions with known answers and open questions all requiring a “yes” or “no” thought using frontocentral oxygenation changes measured with fNIRS. Three patients completed more than 46 sessions spread over several weeks, and one patient (patient W) completed 20 sessions. Online fNIRS classification of personal questions with known answers and open questions using linear support vector machine (SVM) resulted in an above-chance-level correct response rate over 70%. Electroencephalographic oscillations and electrooculographic signals did not exceed the chance-level threshold for correct communication despite occasional differences between the physiological signals representing a “yes” or “no” response. However, electroencephalogram (EEG) changes in the theta-frequency band correlated with inferior communication performance, probably because of decreased vigilance and attention. If replicated with ALS patients in CLIS, these positive results could indicate the first step towards abolition of complete locked-in states, at least for ALS.
BACKGROUND: Although many computational methods have been developed to predict protein subcellular localization, most of the methods are limited to the prediction of single-location proteins. Multi-location proteins are either not considered or assumed not existing. However, proteins with multiple locations are particularly interesting because they may have special biological functions, which are essential to both basic research and drug discovery. RESULTS: This paper proposes an efficient multi-label predictor, namely mGOASVM, for predicting the subcellular localization of multi-location proteins. Given a protein, the accession numbers of its homologs are obtained via BLAST search. Then, the original accession number and the homologous accession numbers of the protein are used as keys to search against the Gene Ontology (GO) annotation database to obtain a set of GO terms. Given a set of training proteins, a set of T relevant GO terms is obtained by finding all of the GO terms in the GO annotation database that are relevant to the training proteins. These relevant GO terms then form the basis of a T-dimensional Euclidean space on which the GO vectors lie. A support vector machine (SVM) classifier with a new decision scheme is proposed to classify the multi-label GO vectors. The mGOASVM predictor has the following advantages: (1) it uses the frequency of occurrences of GO terms for feature representation; (2) it selects the relevant GO subspace which can substantially speed up the prediction without compromising performance; and (3) it adopts an efficient multi-label SVM classifier which significantly outperforms other predictors. Briefly, on two recently published virus and plant datasets, mGOASVM achieves an actual accuracy of 88.9% and 87.4%, respectively,which are significantly higher than those achieved by the state-of-the-art predictors such as iLoc-Virus (74.8%) and iLoc-Plant (68.1%). CONCLUSIONS: mGOASVM can efficiently predict the subcellular locations of multi-label proteins. The mGOASVM predictor is available online at http://bioinfo.eie.polyu.edu.hk/mGoaSvmServer/mGOASVM.html.
Spike pattern classification is a key topic in machine learning, computational neuroscience, and electronic device design. Here, we offer a new supervised learning rule based on Support Vector Machines (SVM) to determine the synaptic weights of a leaky integrate-and-fire (LIF) neuron model for spike pattern classification. We compare classification performance between this algorithm and other methods sharing the same conceptual framework. We consider the effect of postsynaptic potential (PSP) kernel dynamics on patterns separability, and we propose an extension of the method to decrease computational load. The algorithm performs well in generalization tasks. We show that the peak value of spike patterns separability depends on a relation between PSP dynamics and spike pattern duration, and we propose a particular kernel that is well-suited for fast computations and electronic implementations.
Detection of foreign matter in cleaned cotton is instrumental to accurately grading cotton quality, which in turn impacts the marketability of the cotton. Current grading systems return estimates of the amount of foreign matter present, but provide no information about the identity of the contaminants. This paper explores the use of pulsed thermographic analysis to detect and identify cotton foreign matter. The design and implementation of a pulsed thermographic analysis system is described. A sample set of 240 foreign matter and cotton lint samples were collected. Hand-crafted waveform features and frequency-domain features were extracted and analyzed for statistical significance. Classification was performed on these features using linear discriminant analysis and support vector machines. Using waveform features and support vector machine classifiers, detection of cotton foreign matter was performed with 99.17% accuracy. Using frequency-domain features and linear discriminant analysis, identification was performed with 90.00% accuracy. These results demonstrate that pulsed thermographic imaging analysis produces data which is of significant utility for the detection and identification of cotton foreign matter.
Lung cancer (LC) is the leading cause of cancer-related deaths worldwide. Early LC diagnosis is crucial to reduce the high case fatality rate of this disease. In this case-control study, we developed an accurate LC diagnosis test using retrospectively collected formalin-fixed paraffin-embedded (FFPE) human lung tissues and prospectively collected exhaled breath condensates (EBCs). Following international guidelines for diagnostic methods with clinical application, reproducible standard operating procedures (SOP) were established for every step comprising our LC diagnosis method. We analyzed the expression of distinct mRNAs expressed from GATA6 and NKX2-1, key regulators of lung development. The Em/Ad expression ratios of GATA6 and NKX2-1 detected in EBCs were combined using linear kernel support vector machines (SVM) into the LC score, which can be used for LC detection. LC score-based diagnosis achieved a high performance in an independent validation cohort. We propose our method as a non-invasive, accurate, and low-price option to complement the success of computed tomography imaging (CT) and chest X-ray (CXR) for LC diagnosis.
- Proceedings of the National Academy of Sciences of the United States of America
- Published over 2 years ago
Human pluripotent stem cell-based in vitro models that reflect human physiology have the potential to reduce the number of drug failures in clinical trials and offer a cost-effective approach for assessing chemical safety. Here, human embryonic stem (ES) cell-derived neural progenitor cells, endothelial cells, mesenchymal stem cells, and microglia/macrophage precursors were combined on chemically defined polyethylene glycol hydrogels and cultured in serum-free medium to model cellular interactions within the developing brain. The precursors self-assembled into 3D neural constructs with diverse neuronal and glial populations, interconnected vascular networks, and ramified microglia. Replicate constructs were reproducible by RNA sequencing (RNA-Seq) and expressed neurogenesis, vasculature development, and microglia genes. Linear support vector machines were used to construct a predictive model from RNA-Seq data for 240 neural constructs treated with 34 toxic and 26 nontoxic chemicals. The predictive model was evaluated using two standard hold-out testing methods: a nearly unbiased leave-one-out cross-validation for the 60 training compounds and an unbiased blinded trial using a single hold-out set of 10 additional chemicals. The linear support vector produced an estimate for future data of 0.91 in the cross-validation experiment and correctly classified 9 of 10 chemicals in the blinded trial.
Abstract: Digital staining for the automated annotation of Mass Spectrometry Imaging (MSI) data has previously been achieved using state-of-the-art classifiers such as random forests or support vector machines (SVMs). However, the training of such classifiers requires an expert to label exemplary data in advance. This process is time-consuming and hence costly, especially if the tissue is heterogeneous. In theory, it may be sufficient to only label few highly representative pixels of an MS image, but it is not known a priori which pixels to select. This motivates active learning strategies in which the algorithm itself queries the expert by automatically suggesting promising candidate pixels of an MS image for labeling. Given a suitable querying strategy, the number of required training labels can be significantly reduced while maintaining classification accuracy. In this work, we propose active learning for convenient annotation of MSI data. We generalize a recently proposed active learning method to the multi-class case and combine it with the random forest classifier. Its superior performance over random sampling is demonstrated on Secondary Ion Mass Spectrometry data, making it an interesting approach for the classification of mass spectrometry images.
Purpose To investigate the diagnostic accuracy of an image-based classifier to distinguish between Alzheimer disease (AD) and behavioral variant frontotemporal dementia (bvFTD) in individual patients by using gray matter (GM) density maps computed from standard T1-weighted structural images obtained with multiple imagers and with independent training and prediction data. Materials and Methods The local institutional review board approved the study. Eighty-four patients with AD, 51 patients with bvFTD, and 94 control subjects were divided into independent training (n = 115) and prediction (n = 114) sets with identical diagnosis and imager type distributions. Training of a support vector machine (SVM) classifier used diagnostic status and GM density maps and produced voxelwise discrimination maps. Discriminant function analysis was used to estimate suitability of the extracted weights for single-subject classification in the prediction set. Receiver operating characteristic (ROC) curves and area under the ROC curve (AUC) were calculated for image-based classifiers and neuropsychological z scores. Results Training accuracy of the SVM was 85% for patients with AD versus control subjects, 72% for patients with bvFTD versus control subjects, and 79% for patients with AD versus patients with bvFTD (P ≤ .029). Single-subject diagnosis in the prediction set when using the discrimination maps yielded accuracies of 88% for patients with AD versus control subjects, 85% for patients with bvFTD versus control subjects, and 82% for patients with AD versus patients with bvFTD, with a good to excellent AUC (range, 0.81-0.95; P ≤ .001). Machine learning-based categorization of AD versus bvFTD based on GM density maps outperforms classification based on neuropsychological test results. Conclusion The SVM can be used in single-subject discrimination and can help the clinician arrive at a diagnosis. The SVM can be used to distinguish disease-specific GM patterns in patients with AD and those with bvFTD as compared with normal aging by using common T1-weighted structural MR imaging. (©) RSNA, 2015.
Sequence-Based Prediction of microRNA-Binding Residues in Proteins Using Cost-Sensitive Laplacian Support Vector Machines
- IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM
- Published over 4 years ago
The recognition of microRNA (miRNA)-binding residues in proteins is helpful to understand how miRNAs silence their target genes. It is difficult to use existing computational method to predict miRNA-binding residues in proteins due to the lack of training examples. To address this issue, unlabeled data may be exploited to help construct a computational model. Semi-supervised learning deals with methods for exploiting unlabeled data in addition to labeled data automatically to improve learning performance, where no human intervention is assumed. In addition, miRNA-binding proteins almost always contain a much smaller number of binding than non-binding residues, and cost-sensitive learning has been deemed as a good solution to the class imbalance problem. In this work, a novel model is proposed for recognizing miRNA-binding residues in proteins from sequences using a cost-sensitive extension of Laplacian Support Vector Machines (CS-LapSVM) with a hybrid feature. The hybrid feature consists of evolutionary information of the amino acid sequence (PSSMs), the conservation information about three biochemical properties (HKM) and mutual interaction propensities in protein-miRNA complex structures. The CS-LapSVM receives good performance with a F1 score of 26.23±2.55% and an AUC value of 0.805±0.020 superior to exisiting approaches for the recognition of RNA-binding residues. A web server called SARS is built and freely available for academic usage.
The European REACH regulation requires information on ready biodegradation, which is a screening test to assess the biodegradability of chemicals. At the same time REACH encourages the use of alternatives to animal testing which includes predictions from QSAR models. The aim of this study was to build QSAR models to predict ready biodegradation of chemicals by using different modelling methods and types of molecular descriptors. Particular attention was given to data screening and validation procedures in order to build predictive models. Experimental values of 1055 chemicals were collected from the webpage of the National Institute of Technology and Evaluation of Japan (NITE): 837 and 218 molecules were used for calibration and testing purposes, respectively. In addition, models were further evaluated using an external validation set consisting of 670 molecules. Classification models were produced in order to discriminate biodegradable and non-biodegradable chemicals by means of different mathematical methods: k Nearest Neighbours, Partial Least Squares Discriminant Analysis and Support Vector Machines, as well as their consensus models. The proposed models and the derived consensus analysis demonstrated good classification performances with respect to already published QSAR models on biodegradation. Relationships between the molecular descriptors selected in each QSAR model and biodegradability were evaluated.