SciCombinator

Discover the most talked about and latest scientific content & concepts.

Concept: Supervised learning

168

Spike pattern classification is a key topic in machine learning, computational neuroscience, and electronic device design. Here, we offer a new supervised learning rule based on Support Vector Machines (SVM) to determine the synaptic weights of a leaky integrate-and-fire (LIF) neuron model for spike pattern classification. We compare classification performance between this algorithm and other methods sharing the same conceptual framework. We consider the effect of postsynaptic potential (PSP) kernel dynamics on patterns separability, and we propose an extension of the method to decrease computational load. The algorithm performs well in generalization tasks. We show that the peak value of spike patterns separability depends on a relation between PSP dynamics and spike pattern duration, and we propose a particular kernel that is well-suited for fast computations and electronic implementations.

Concepts: Action potential, Machine learning, Computer, Computer science, Support vector machine, Pattern recognition, Computational neuroscience, Supervised learning

144

Inflammatory bowel disease (IBD) and alimentary lymphoma (ALA) are common gastrointestinal diseases in cats. The very similar clinical signs and histopathologic features of these diseases make the distinction between them diagnostically challenging. We tested the use of supervised machine-learning algorithms to differentiate between the 2 diseases using data generated from noninvasive diagnostic tests. Three prediction models were developed using 3 machine-learning algorithms: naive Bayes, decision trees, and artificial neural networks. The models were trained and tested on data from complete blood count (CBC) and serum chemistry (SC) results for the following 3 groups of client-owned cats: normal, inflammatory bowel disease (IBD), or alimentary lymphoma (ALA). Naive Bayes and artificial neural networks achieved higher classification accuracy (sensitivities of 70.8% and 69.2%, respectively) than the decision tree algorithm (63%, p < 0.0001). The areas under the receiver-operating characteristic curve for classifying cases into the 3 categories was 83% by naive Bayes, 79% by decision tree, and 82% by artificial neural networks. Prediction models using machine learning provided a method for distinguishing between ALA-IBD, ALA-normal, and IBD-normal. The naive Bayes and artificial neural networks classifiers used 10 and 4 of the CBC and SC variables, respectively, to outperform the C4.5 decision tree, which used 5 CBC and SC variables in classifying cats into the 3 classes. These models can provide another noninvasive diagnostic tool to assist clinicians with differentiating between IBD and ALA, and between diseased and nondiseased cats.

Concepts: Gastroenterology, Inflammatory bowel disease, Artificial intelligence, Machine learning, Neural network, Decision tree, Decision tree learning, Supervised learning

84

Label-free cell analysis is essential to personalized genomics, cancer diagnostics, and drug development as it avoids adverse effects of staining reagents on cellular viability and cell signaling. However, currently available label-free cell assays mostly rely only on a single feature and lack sufficient differentiation. Also, the sample size analyzed by these assays is limited due to their low throughput. Here, we integrate feature extraction and deep learning with high-throughput quantitative imaging enabled by photonic time stretch, achieving record high accuracy in label-free cell classification. Our system captures quantitative optical phase and intensity images and extracts multiple biophysical features of individual cells. These biophysical measurements form a hyperdimensional feature space in which supervised learning is performed for cell classification. We compare various learning algorithms including artificial neural network, support vector machine, logistic regression, and a novel deep learning pipeline, which adopts global optimization of receiver operating characteristics. As a validation of the enhanced sensitivity and specificity of our system, we show classification of white blood T-cells against colon cancer cells, as well as lipid accumulating algal strains for biofuel production. This system opens up a new path to data-driven phenotypic diagnosis and better understanding of the heterogeneous gene expressions in cells.

Concepts: Gene, Gene expression, Sensitivity and specificity, Machine learning, Neural network, Artificial neural network, Binary classification, Supervised learning

60

We developed computational models to predict the emergence of depression and Post-Traumatic Stress Disorder in Twitter users. Twitter data and details of depression history were collected from 204 individuals (105 depressed, 99 healthy). We extracted predictive features measuring affect, linguistic style, and context from participant tweets (N = 279,951) and built models using these features with supervised learning algorithms. Resulting models successfully discriminated between depressed and healthy content, and compared favorably to general practitioners' average success rates in diagnosing depression, albeit in a separate population. Results held even when the analysis was restricted to content posted before first depression diagnosis. State-space temporal analysis suggests that onset of depression may be detectable from Twitter data several months prior to diagnosis. Predictive results were replicated with a separate sample of individuals diagnosed with PTSD (Nusers = 174, Ntweets = 243,775). A state-space time series model revealed indicators of PTSD almost immediately post-trauma, often many months prior to clinical diagnosis. These methods suggest a data-driven, predictive approach for early screening and detection of mental illness.

Concepts: Diagnosis, Psychological trauma, Machine learning, Selective serotonin reuptake inhibitor, Cognitive behavioral therapy, Posttraumatic stress disorder, Sertraline, Supervised learning

37

Here, we present the use of ethoscopes, which are machines for high-throughput analysis of behavior in Drosophila and other animals. Ethoscopes provide a software and hardware solution that is reproducible and easily scalable. They perform, in real-time, tracking and profiling of behavior by using a supervised machine learning algorithm, are able to deliver behaviorally triggered stimuli to flies in a feedback-loop mode, and are highly customizable and open source. Ethoscopes can be built easily by using 3D printing technology and rely on Raspberry Pi microcomputers and Arduino boards to provide affordable and flexible hardware. All software and construction specifications are available at http://lab.gilest.ro/ethoscope.

Concepts: Scientific method, Algorithm, Machine learning, Learning, Open source, Supervised learning, Theoretical computer science

31

Purpose To investigate whether multivariate pattern recognition analysis of arterial spin labeling (ASL) perfusion maps can be used for classification and single-subject prediction of patients with Alzheimer disease (AD) and mild cognitive impairment (MCI) and subjects with subjective cognitive decline (SCD) after using the W score method to remove confounding effects of sex and age. Materials and Methods Pseudocontinuous 3.0-T ASL images were acquired in 100 patients with probable AD; 60 patients with MCI, of whom 12 remained stable, 12 were converted to a diagnosis of AD, and 36 had no follow-up; 100 subjects with SCD; and 26 healthy control subjects. The AD, MCI, and SCD groups were divided into a sex- and age-matched training set (n = 130) and an independent prediction set (n = 130). Standardized perfusion scores adjusted for age and sex (W scores) were computed per voxel for each participant. Training of a support vector machine classifier was performed with diagnostic status and perfusion maps. Discrimination maps were extracted and used for single-subject classification in the prediction set. Prediction performance was assessed with receiver operating characteristic (ROC) analysis to generate an area under the ROC curve (AUC) and sensitivity and specificity distribution. Results Single-subject diagnosis in the prediction set by using the discrimination maps yielded excellent performance for AD versus SCD (AUC, 0.96; P < .01), good performance for AD versus MCI (AUC, 0.89; P < .01), and poor performance for MCI versus SCD (AUC, 0.63; P = .06). Application of the AD versus SCD discrimination map for prediction of MCI subgroups resulted in good performance for patients with MCI diagnosis converted to AD versus subjects with SCD (AUC, 0.84; P < .01) and fair performance for patients with MCI diagnosis converted to AD versus those with stable MCI (AUC, 0.71; P > .05). Conclusion With automated methods, age- and sex-adjusted ASL perfusion maps can be used to classify and predict diagnosis of AD, conversion of MCI to AD, stable MCI, and SCD with good to excellent accuracy and AUC values. (©) RSNA, 2016.

Concepts: Alzheimer's disease, Type I and type II errors, Sensitivity and specificity, Machine learning, Receiver operating characteristic, Binary classification, Statistical classification, Supervised learning

30

Sepsis is a leading cause of death and is the most expensive condition to treat in U.S. hospitals. Despite targeted efforts to automate earlier detection of sepsis, current techniques rely exclusively on using either standard clinical data or novel biomarker measurements. In this study, we apply machine learning techniques to assess the predictive power of combining multiple biomarker measurements from a single blood sample with electronic medical record data (EMR) for the identification of patients in the early to peak phase of sepsis in a large community hospital setting. Combining biomarkers and EMR data achieved an area under the receiver operating characteristic (ROC) curve (AUC) of 0.81, while EMR data alone achieved an AUC of 0.75. Furthermore, a single measurement of six biomarkers (IL-6, nCD64, IL-1ra, PCT, MCP1, and G-CSF) yielded the same predictive power as collecting an additional 16 hours of EMR data(AUC of 0.80), suggesting that the biomarkers may be useful for identifying these patients earlier. Ultimately, supervised learning using a subset of biomarker and EMR data as features may be capable of identifying patients in the early to peak phase of sepsis in a diverse population and may provide a tool for more timely identification and intervention.

Concepts: Hospital, Machine learning, Receiver operating characteristic, Electronic medical record, Identification, Supervised learning

27

OBJECTIVE: To determine whether the knowledge contained in a rich corpus of local terms mapped to LOINC (Logical Observation Identifiers Names and Codes) could be leveraged to help map local terms from other institutions. METHODS: We developed two models to test our hypothesis. The first based on supervised machine learning was created using Apache’s OpenNLP Maxent and the second based on information retrieval was created using Apache’s Lucene. The models were validated by a random subsampling method that was repeated 20 times and that used 80/20 splits for training and testing, respectively. We also evaluated the performance of these models on all laboratory terms from three test institutions. RESULTS: For the 20 iterations used for validation of our 80/20 splits Maxent and Lucene ranked the correct LOINC code first for between 70.5% and 71.4% and between 63.7% and 65.0% of local terms, respectively. For all laboratory terms from the three test institutions Maxent ranked the correct LOINC code first for between 73.5% and 84.6% (mean 78.9%) of local terms, whereas Lucene’s performance was between 66.5% and 76.6% (mean 71.9%). Using a cut-off score of 0.46 Maxent always ranked the correct LOINC code first for over 57% of local terms. CONCLUSIONS: This study showed that a rich corpus of local terms mapped to LOINC contains collective knowledge that can help map terms from other institutions. Using freely available software tools, we developed a data-driven automated approach that operates on term descriptions from existing mappings in the corpus. Accurate and efficient automated mapping methods can help to accelerate adoption of vocabulary standards and promote widespread health information exchange.

Concepts: Scientific method, Validation, Map, Standard, Standards, Information retrieval, Supervised learning, LOINC

27

Medical image analysis remains a challenging application area for artificial intelligence. When applying machine learning, obtaining ground-truth labels for supervised learning is more difficult than in many more common applications of machine learning. This is especially so for datasets with abnormalities, as tissue types and the shapes of the organs in these datasets differ widely. However, organ detection in such an abnormal dataset may have many promising potential real world applications such as automatic diagnosis, automated radiotherapy planning, and medical image retrieval, where new multi-modal medical images provide more information about the imaged tissues for diagnosis. Here we test the application of deep learning methods to organ identification in magnetic resonance medical images, with visual and temporal hierarchical features learnt to categorise object classes from an unlabelled multi-modal DCE-MRI dataset, so that only a weakly supervised training is required for a classifier. A probabilistic patch-based method was employed for multiple organ detection, with the features learnt from the deep learning model. This shows the potential of the deep learning model for application to medical images, despite the difficulty of obtaining libraries of correctly labelled training datasets, and despite the intrinsic abnormalities present in patient datasets.

Concepts: X-ray, Medical imaging, Artificial intelligence, Machine learning, Learning, Artificial neural network, IMAGE, Supervised learning

25

An overview is provided of the challenges involved in building computer-aided diagnosis systems capable of precise medical diagnostics based on integration and interpretation of data from different sources and formats. The availability of massive amounts of data and computational methods associated with the Big Data paradigm has brought hope that such systems may soon be available in routine clinical practices, which is not the case today. We focus on visual and machine learning analysis of medical data acquired with varied nanotech-based techniques and on methods for Big Data infrastructure. Because diagnosis is essentially a classification task, we address the machine learning techniques with supervised and unsupervised classification, making a critical assessment of the progress already made in the medical field and the prospects for the near future. We also advocate that successful computer-aided diagnosis requires a merge of methods and concepts from nanotechnology and Big Data analysis.

Concepts: Scientific method, Medicine, Diagnosis, Greek loanwords, Physician, Machine learning, Unsupervised learning, Supervised learning