Concept: Decision trees
The aim of this study was to develop a new data-mining model to predict axillary lymph node (AxLN) metastasis in primary breast cancer. To achieve this, we used a decision tree-based prediction method-the alternating decision tree (ADTree).
While associations between specific risk factors and subsequent suicidal thoughts or behaviours have been widely examined, there is limited understanding of the interplay between risk factors in the development of suicide risk. This study used a decision tree approach to develop individual models of suicide risk and identify the risk factors for suicidality that are important for different subpopulations.
Epilepsy is a global disease with considerable incidence due to recurrent unprovoked seizures. These seizures can be noninvasively diagnosed using electroencephalogram (EEG), a measure of neuronal electrical activity in brain recorded along scalp. EEG is highly nonlinear, nonstationary and non-Gaussian in nature. Nonlinear adaptive models such as empirical mode decomposition (EMD) provide intuitive understanding of information present in these signals. In this study a novel methodology is proposed to automatically classify EEG of normal, inter-ictal and ictal subjects using EMD decomposition. EEG decomposition using EMD yields few intrinsic mode functions (IMF), which are amplitude and frequency modulated (AM and FM) waves. Hilbert transform of these IMF provides AM and FM frequencies. Features such as spectral peaks, spectral entropy and spectral energy in each IMF are extracted and fed to decision tree classifier for automated diagnosis. In this work, we have compared the performance of classification using two types of decision trees (i) classification and regression tree (CART) and (ii) C4.5. We have obtained the highest average accuracy of 95.33%, average sensitivity of 98%, and average specificity of 97% using C4.5 decision tree classifier. The developed methodology is ready for clinical validation on large databases and can be deployed for mass screening.
Demonstrate the application of decision trees-classification and regression trees (CARTs), and their cousins, boosted regression trees (BRTs)-to understand structure in missing data.
Machine learning is becoming a popular and important approach in the field of medical research. In this study, we investigate the relative performance of various machine learning methods such as Decision Tree, Naïve Bayes, Logistic Regression, Logistic Model Tree and Random Forests for predicting incident diabetes using medical records of cardiorespiratory fitness. In addition, we apply different techniques to uncover potential predictors of diabetes. This FIT project study used data of 32,555 patients who are free of any known coronary artery disease or heart failure who underwent clinician-referred exercise treadmill stress testing at Henry Ford Health Systems between 1991 and 2009 and had a complete 5-year follow-up. At the completion of the fifth year, 5,099 of those patients have developed diabetes. The dataset contained 62 attributes classified into four categories: demographic characteristics, disease history, medication use history, and stress test vital signs. We developed an Ensembling-based predictive model using 13 attributes that were selected based on their clinical importance, Multiple Linear Regression, and Information Gain Ranking methods. The negative effect of the imbalance class of the constructed model was handled by Synthetic Minority Oversampling Technique (SMOTE). The overall performance of the predictive model classifier was improved by the Ensemble machine learning approach using the Vote method with three Decision Trees (Naïve Bayes Tree, Random Forest, and Logistic Model Tree) and achieved high accuracy of prediction (AUC = 0.92). The study shows the potential of ensembling and SMOTE approaches for predicting incident diabetes using cardiorespiratory fitness data.
- Injury prevention : journal of the International Society for Child and Adolescent Injury Prevention
- Published over 3 years ago
The profession of a horse-racing jockey is a dangerous one. We developed a decision tree model quantifying the effects of implementing different safety strategies on jockey fall and injury rates and their associated costs.
BACKGROUND: Internet-based interventions are seen as attractive for harmful users of alcohol and lead to desirable clinical outcomes. Some participants will however not achieve the desired results. In this study, harmful users of alcohol have been partitioned in subgroups with low, intermediate or high probability of positive treatment outcome, using recursive partitioning classification tree analysis. METHODS: Data were obtained from a randomized controlled trial assessing the effectiveness of two Internet-based alcohol interventions. The main outcome variable was treatment response, a dichotomous outcome measure for treatment success. Candidate predictors for the classification analysis were first selected using univariate regression. Next, a tree decision model to classify participants in categories with a low, medium and high probability of treatment response was constructed using recursive partitioning software. RESULTS: Based on literature review, 46 potentially relevant baseline predictors were identified. Five variables were selected using univariate regression as candidate predictors for the classification analysis. Two variables were found most relevant for classification and selected for the decision tree model: ‘living alone’, and ‘interpersonal sensitivity’. Using sensitivity analysis, the robustness of the decision tree model was supported. CONCLUSIONS: Harmful alcohol users in a shared living situation, with high interpersonal sensitivity, have a significantly higher probability of treatment response. The resulting decision tree model may be used as part of a decision support system but is on its own insufficient as a screening algorithm with satisfactory clinical utility.Trial registration: Netherlands Trial Register (Cochrane Collaboration): NTR-TC1155.
OBJECTIVE The aim of this study was to create prediction models for outcome parameters by decision tree analysis based on clinical and laboratory data in patients with aneurysmal subarachnoid hemorrhage (aSAH). METHODS The database consisted of clinical and laboratory parameters of 548 patients with aSAH who were admitted to the Neurocritical Care Unit, University Hospital Zurich. To examine the model performance, the cohort was randomly divided into a derivation cohort (60% [n = 329]; training data set) and a validation cohort (40% [n = 219]; test data set). The classification and regression tree prediction algorithm was applied to predict death, functional outcome, and ventriculoperitoneal (VP) shunt dependency. Chi-square automatic interaction detection was applied to predict delayed cerebral infarction on days 1, 3, and 7. RESULTS The overall mortality was 18.4%. The accuracy of the decision tree models was good for survival on day 1 and favorable functional outcome at all time points, with a difference between the training and test data sets of < 5%. Prediction accuracy for survival on day 1 was 75.2%. The most important differentiating factor was the interleukin-6 (IL-6) level on day 1. Favorable functional outcome, defined as Glasgow Outcome Scale scores of 4 and 5, was observed in 68.6% of patients. Favorable functional outcome at all time points had a prediction accuracy of 71.1% in the training data set, with procalcitonin on day 1 being the most important differentiating factor at all time points. A total of 148 patients (27%) developed VP shunt dependency. The most important differentiating factor was hyperglycemia on admission. CONCLUSIONS The multiple variable analysis capability of decision trees enables exploration of dependent variables in the context of multiple changing influences over the course of an illness. The decision tree currently generated increases awareness of the early systemic stress response, which is seemingly pertinent for prognostication.
To keep pace with its rapid development an efficient approach for the risk assessment of nanomaterials is needed. Grouping concepts as developed for chemicals are now being explored for its applicability to nanomaterials. One of the recently proposed grouping systems is DF4nanoGrouping scheme. In this study, we have developed three structure-activity relationship classification tree models to be used for supporting this system by identifying structural features of nanomaterials mainly responsible for the surface activity. We used data from 19 nanomaterials that were synthesized and characterized extensively in previous studies. Subsets of these materials have been used in other studies (short-term inhalation, protein carbonylation, and intrinsic oxidative potential), resulting in a unique data set for modeling. Out of a large set of 285 possible descriptors, we have demonstrated that only three descriptors (size, specific surface area, and the quantum-mechanical calculated property ‘lowest unoccupied molecular orbital’) need to be used to predict the endpoints investigated. The maximum number of descriptors that were finally selected by the classification trees (CT) was very low- one for intrinsic oxidative potential, two for protein carbonylation, and three for NOAEC. This suggests that the models were well-constructed and not over-fitted. The outcome of various statistical measures and the applicability domains of our models further indicate their robustness. Therefore, we conclude that CT can be a useful tool within the DF4nanoGrouping scheme that has been proposed before.
Personalized medicine, or tailored therapy, has been an active and important topic in recent medical research. Many methods have been proposed in the literature for predictive biomarkers detection and subgroup identification. In this paper, we propose a novel decision tree based approach applicable in randomized clinical trials. We model the prognostic effects of the biomarkers using additive regression trees and the biomarker-by-treatment effect using a single regression tree. Bayesian approach is utilized to periodically revise the split variables and the split rules of the decision trees, which provides a better overall fitting. Gibbs sampler is implemented in the MCMC procedure, which updates the prognostic trees and the interaction tree separately. We use the posterior distribution of the interaction tree to construct the predictive scores of the biomarkers and to identify the subgroup where the treatment is superior to the control. Numerical simulations show that our proposed method performs well under various settings comparing to existing methods. We also demonstrate an application of our method in a real clinical trial.