Discover the most talked about and latest scientific content & concepts.

Concept: Linear discriminant analysis


BACKGROUND: Non-invasive phenotyping of chronic respiratory diseases would be highly beneficial in the personalised medicine of the future. Volatile organic compounds can be measured in the exhaled breath and may be produced or altered by disease processes. We investigated whether distinct patterns of these compounds were present in chronic obstructive pulmonary disease (COPD) and clinically relevant disease phenotypes. METHODS: Breath samples from 39 COPD subjects and 32 healthy controls were collected and analysed using gas chromatography time-of-flight mass spectrometry. Subjects with COPD also underwent sputum induction. Discriminatory compounds were identified by univariate logistic regression followed by multivariate analysis: 1. principal component analysis; 2. multivariate logistic regression; 3. receiver operating characteristic (ROC) analysis. RESULTS: Comparing COPD versus healthy controls, principal component analysis clustered the 20 best-discriminating compounds into four components explaining 71% of the variance. Multivariate logistic regression constructed an optimised model using two components with an accuracy of 69%. The model had 85% sensitivity, 50% specificity and ROC area under the curve of 0.74. Analysis of COPD subgroups showed the method could classify COPD subjects with far greater accuracy. Models were constructed which classified subjects with [GREATER-THAN OR EQUAL TO]2% sputum eosinophilia with ROC area under the curve of 0.94 and those having frequent exacerbations 0.95. Potential biomarkers correlated to clinical variables were identified in each subgroup. CONCLUSION: The exhaled breath volatile organic compound profile discriminated between COPD and healthy controls and identified clinically relevant COPD subgroups. If these findings are validated in prospective cohorts, they may have diagnostic and management value in this disease.

Concepts: Medicine, Asthma, Pneumonia, Multivariate statistics, Chronic obstructive pulmonary disease, Volatile organic compound, Organic compounds, Linear discriminant analysis


BACKGROUND: Static posture, repetitive movements and lack of physical variation are known risk factors for work-related musculoskeletal disorders, and thus needs to be properly assessed in occupational studies. The aims of this study were (i) to investigate the effectiveness of a conventional exposure variation analysis (EVA) in discriminating exposure time lines and (ii) to compare it with a new cluster-based method for analysis of exposure variation. METHODS: For this purpose, we simulated a repeated cyclic exposure varying within each cycle between “low” and “high” exposure levels in a “near” or “far” range, and with “low” or “high” velocities (exposure change rates). The duration of each cycle was also manipulated by selecting a “small” or “large” standard deviation of the cycle time. Theses parameters reflected three dimensions of exposure variation, i.e. range, frequency and temporal similarity.Each simulation trace included two realizations of 100 concatenated cycles with either low (rho = 0.1), medium (rho = 0.5) or high (rho = 0.9) correlation between the realizations. These traces were analyzed by conventional EVA, and a novel cluster-based EVA (C-EVA). Principal component analysis (PCA) was applied on the marginal distributions of 1) the EVA of each of the realizations (univariate approach), 2) a combination of the EVA of both realizations (multivariate approach) and 3) C-EVA. The least number of principal components describing more than 90% of variability in each case was selected and the projection of marginal distributions along the selected principal component was calculated. A linear classifier was then applied to these projections to discriminate between the simulated exposure patterns, and the accuracy of classified realizations was determined. RESULTS: C-EVA classified exposures more correctly than univariate and multivariate EVA approaches; classification accuracy was 49%, 47% and 52% for EVA (univariate and multivariate), and C-EVA, respectively (p < 0.001). All three methods performed poorly in discriminating exposure patterns differing with respect to the variability in cycle time duration. CONCLUSION: While C-EVA had a higher accuracy than conventional EVA, both failed to detect differences in temporal similarity. The data-driven optimality of data reduction and the capability of handling multiple exposure time lines in a single analysis are the advantages of the C-EVA.

Concepts: Multivariate statistics, Factor analysis, Principal component analysis, Exposure, Singular value decomposition, Photography, Linear discriminant analysis, The Unscrambler


We estimate models of consumer food waste awareness and attitudes using responses from a national survey of U.S. residents. Our models are interpreted through the lens of several theories that describe how pro-social behaviors relate to awareness, attitudes and opinions. Our analysis of patterns among respondents' food waste attitudes yields a model with three principal components: one that represents perceived practical benefits households may lose if food waste were reduced, one that represents the guilt associated with food waste, and one that represents whether households feel they could be doing more to reduce food waste. We find our respondents express significant agreement that some perceived practical benefits are ascribed to throwing away uneaten food, e.g., nearly 70% of respondents agree that throwing away food after the package date has passed reduces the odds of foodborne illness, while nearly 60% agree that some food waste is necessary to ensure meals taste fresh. We identify that these attitudinal responses significantly load onto a single principal component that may represent a key attitudinal construct useful for policy guidance. Further, multivariate regression analysis reveals a significant positive association between the strength of this component and household income, suggesting that higher income households most strongly agree with statements that link throwing away uneaten food to perceived private benefits.

Concepts: Regression analysis, Statistics, Multivariate statistics, Household, Linear discriminant analysis, Household income in the United States, Food safety, Income quintiles


Rapid evaporative ionization mass spectrometry (REIMS) is an emerging technique that allows near-real-time characterization of human tissue in vivo by analysis of the aerosol (“smoke”) released during electrosurgical dissection. The coupling of REIMS technology with electrosurgery for tissue diagnostics is known as the intelligent knife (iKnife). This study aimed to validate the technique by applying it to the analysis of fresh human tissue samples ex vivo and to demonstrate the translation to real-time use in vivo in a surgical environment. A variety of tissue samples from 302 patients were analyzed in the laboratory, resulting in 1624 cancerous and 1309 noncancerous database entries. The technology was then transferred to the operating theater, where the device was coupled to existing electrosurgical equipment to collect data during a total of 81 resections. Mass spectrometric data were analyzed using multivariate statistical methods, including principal components analysis (PCA) and linear discriminant analysis (LDA), and a spectral identification algorithm using a similar approach was implemented. The REIMS approach differentiated accurately between distinct histological and histopathological tissue types, with malignant tissues yielding chemical characteristics specific to their histopathological subtypes. Tissue identification via intraoperative REIMS matched the postoperative histological diagnosis in 100% (all 81) of the cases studied. The mass spectra reflected lipidomic profiles that varied between distinct histological tumor types and also between primary and metastatic tumors. Thus, in addition to real-time diagnostic information, the spectra provided additional information on divergent tumor biochemistry that may have mechanistic importance in cancer.

Concepts: Cancer, Oncology, Mass spectrometry, Histology, Multivariate statistics, Principal component analysis, Tissue, Linear discriminant analysis


Saffron is one of the oldest and most expensive spices, which is often target of fraudulent activities. In this research, a new strategy of saffron authentication based on metabolic fingerprinting was developed. In the first phase, a solid liquid extraction procedure was optimized, the main aim was to isolate as maximal representation of small molecules contained in saffron as possible. In the second step, a detection method based on liquid chromatography coupled with high-resolution mass spectrometry was developed. Initially, principal component analysis (PCA) revealed clear differences between saffron cultivated and packaged in Spain, protected designation of origin (PDO), and saffron packaged in Spain of unknown origin, labeled Spanish saffron. Afterwards, orthogonal partial least square discriminant analysis (OPLS-DA) was favorably used to discriminate between Spanish saffron. The tentative identification of markers showed glycerophospholipids and their oxidized lipids were significant markers according to their origin.

Concepts: Protein, Mass spectrometry, Multivariate statistics, Principal component analysis, Biochemistry, Tandem mass spectrometry, Linear discriminant analysis, The Unscrambler


Understanding the transition of brain activities towards an absence seizure, called pre-epileptic seizure, is a challenge. In this study, multiscale permutation entropy (MPE) is proposed to describe dynamical characteristics of electroencephalograph (EEG) recordings on different absence seizure states. The classification ability of the MPE measures using linear discriminant analysis is evaluated by a series of experiments. Compared to a traditional multiscale entropy method with 86.1% as its classification accuracy, the classification rate of MPE is 90.6%. Experimental results demonstrate there is a reduction of permutation entropy of EEG from the seizure-free state to the seizure state. Moreover, it is indicated that the dynamical characteristics of EEG data with MPE can identify the differences among seizure-free, pre-seizure and seizure states. This also supports the view that EEG has a detectable change prior to an absence seizure.

Concepts: Brain, Electroencephalography, Experiment, Seizure, Absence seizure, Dynamics, State, Linear discriminant analysis


The use of multivariate analysis (MVA) methods in the processing of time-of-flight secondary ion mass spectrometry (ToF-SIMS) data has become increasingly more common. MVA presents a powerful set of tools to aid the user in processing data from complex, multicomponent surfaces such as biological materials and biosensors. When properly used, MVA can help the user identify the major sources of differences within a sample or between samples, determine where certain compounds exist on a sample, or verify the presence of compounds that have been engineered into the surface. Of all the MVA methods, principal component analysis (PCA) is the most commonly used and forms an excellent starting point for the application of many of the other methods employed to process ToF-SIMS data. Herein we discuss the application of PCA and other MVA methods to multicomponent ToF-SIMS data and provide guidelines on their application and use.

Concepts: Spectroscopy, Mass spectrometry, Multivariate statistics, Principal component analysis, Computer program, Linear discriminant analysis, Secondary ion mass spectrometry, The Unscrambler


Oils of various species of Copaifera are commonly found in pharmacies and on popular markets and are widely sold for their medicinal properties. However, the chemical variability between and within species and the lack of standardization of these oils have presented barriers to their wider commercialization. With the aim to recognize patterns for the chemical composition of copaiba oils, 22 oil samples of C. multijuga Hayne species were collected, esterified with CH2 N2 , and characterized by GC-FID and GC/MS analyses. The chromatographic data were processed using hierarchical cluster analysis (HCA) and principal component analysis (PCA). In total, 35 components were identified in the oils, and the multivariate analyses (MVA) allowed the samples to be divided into three groups, with the sesquiterpenes β-caryophyllene and caryophyllene oxide as the main components. These sesquiterpenes, which were detected in all the samples analyzed in different concentrations, were the most important constituents in the differentiation of the groups. There was a prevalence of sesquiterpenes in all the oils studied. In conclusion, GC-FID and GC/MS analyses combined with MVA can be used to determine the chemical composition and to recognize chemical patterns of copaiba oils.

Concepts: Cluster analysis, Multivariate statistics, Mathematical analysis, Principal component analysis, Data mining, Linear discriminant analysis, Multivariate analysis, Kernel principal component analysis


BACKGROUND: A free amino acids profile of 192 samples of seven different floral types of Serbian honey (acacia, linden, sunflower, rape, basil, giant goldenrod, and buckwheat) from six different regions was analyzed in order to distinguish honeys by their botanical origin. RESULTS: The most abundant amino acids were: proline, alanine, phenylalanine, threonine, and arginine. Based on the established amino acid profiles, some important differences have been identified among studied honey samples relying on the basic descriptive statistics data, and confirmed by the multivariate chemometric methods. Principal component analysis revealed that basil honey samples form well defined cluster imposed with phenylalanine content. The model obtained by linear discriminant analysis might be used to distinguish basil honey from the rest of the samples, and has moderate predictive power to separate genuine acacia, linden, sunflower, and rape honeys. A new data for amino acids profile of giant goldenrod and buckwheat honey samples are presented. CONCLUSIONS: The floral origin of honey could be successfully evaluated by its amino acids profile coupled with chemometric analysis.

Concepts: Amino acid, Acid, Amine, Multivariate statistics, Essential amino acid, Principal component analysis, Linear discriminant analysis, Monofloral honey


BACKGROUND: Management of intrauterine growth restriction (IUGR) remains a major issue in perinatalogy. AIMS: The objective of this paper was the assessment of gender-specific fetal heart rate (FHR) dynamics as a diagnostic tool in severe IUGR. SUBJECTS: FHR was analyzed in the antepartum period in 15 severe IUGR fetuses and 18 controls, matched for gestational age, in relation to fetal gender. OUTCOME MEASURES: Linear and entropy methods, such as mean FHR (mFHR), low (LF), high (HF) and movement frequency (MF), approximate, sample and multiscale entropy. Sensitivities and specificities were estimated using Fisher linear discriminant analysis and the leave-one-out method. RESULTS: Overall, IUGR fetuses presented significantly lower mFHR and entropy compared with controls. However, gender-specific analysis showed that significantly lower mFHR was only evident in IUGR males and lower entropy in IUGR females. In addition, lower LF/(MF+HF) was patent in IUGR females compared with controls, but not in males. Rather high sensitivities and specificities were achieved in the detection of the FHR recordings related with IUGR male fetuses, when gender-specific analysis was performed at gestational ages less than 34weeks. CONCLUSIONS: Severe IUGR fetuses present gender-specific linear and entropy FHR changes, compared with controls, characterized by a significantly lower entropy and sympathetic-vagal balance in females than in males. These findings need to be considered in order to achieve better diagnostic results.

Concepts: Pregnancy, Male, Embryo, Fetus, Obstetrics, Gender, Gestational age, Linear discriminant analysis