Concept: Negative predictive value
Background Bronchoscopy is frequently nondiagnostic in patients with pulmonary lesions suspected to be lung cancer. This often results in additional invasive testing, although many lesions are benign. We sought to validate a bronchial-airway gene-expression classifier that could improve the diagnostic performance of bronchoscopy. Methods Current or former smokers undergoing bronchoscopy for suspected lung cancer were enrolled at 28 centers in two multicenter prospective studies (AEGIS-1 and AEGIS-2). A gene-expression classifier was measured in epithelial cells collected from the normal-appearing mainstem bronchus to assess the probability of lung cancer. Results A total of 639 patients in AEGIS-1 (298 patients) and AEGIS-2 (341 patients) met the criteria for inclusion. A total of 43% of bronchoscopic examinations were nondiagnostic for lung cancer, and invasive procedures were performed after bronchoscopy in 35% of patients with benign lesions. In AEGIS-1, the classifier had an area under the receiver-operating-characteristic curve (AUC) of 0.78 (95% confidence interval [CI], 0.73 to 0.83), a sensitivity of 88% (95% CI, 83 to 92), and a specificity of 47% (95% CI, 37 to 58). In AEGIS-2, the classifier had an AUC of 0.74 (95% CI, 0.68 to 0.80), a sensitivity of 89% (95% CI, 84 to 92), and a specificity of 47% (95% CI, 36 to 59). The combination of the classifier plus bronchoscopy had a sensitivity of 96% (95% CI, 93 to 98) in AEGIS-1 and 98% (95% CI, 96 to 99) in AEGIS-2, independent of lesion size and location. In 101 patients with an intermediate pretest probability of cancer, the negative predictive value of the classifier was 91% (95% CI, 75 to 98) among patients with a nondiagnostic bronchoscopic examination. Conclusions The gene-expression classifier improved the diagnostic performance of bronchoscopy for the detection of lung cancer. In intermediate-risk patients with a nondiagnostic bronchoscopic examination, a negative classifier score provides support for a more conservative diagnostic approach. (Funded by Allegro Diagnostics and others; AEGIS-1 and AEGIS-2 ClinicalTrials.gov numbers, NCT01309087 and NCT00746759 .).
BACKGROUND: The objective of this study was to compare celiac disease (CD)– specific antibody tests to determine if they could replace jejunal biopsy in patients with a high pretest probability of CD. METHODS: This retrospective study included sera from 149 CD patients and 119 controls, all with intestinal biopsy. All samples were analyzed for IgA and IgG antibodies against native gliadin (ngli) and deamidated gliadin peptides (dpgli), as well as for IgA antibodies against tissue transglutaminase and endomysium. RESULTS: dpgli were superior to ngli for IgG antibody determination: 68% vs. 92% specificity and 79% vs. 85% sensitivity for ngli and dpgli, respectively. Positive (76% vs. 93%) and negative (72% vs. 83%) predictive values were also higher for dpgli than for ngli. Regarding IgA gliadin antibody determination, sensitivity improved from 61% to 78% with dpgli, while specificity and positive predictive value remained at 97% (P < 0.00001). A combination of four tests (IgA anti-dpgli, IgG anti-dpgli, IgA anti- tissue transglutaminase, and IgA anti-endomysium) yielded positive and negative predictive values of 99% and 100%, respectively and a likelihood ratio positive of 86 with a likelihood ratio negative of 0.00. Omitting the endomysium antibody determination still yielded positive and negative predictive values of 99% and 98%, respectively and a likelihood ratio positive of 87 with a likelihood ratio negative of 0.01. CONCLUSION: dpgli yielded superior results compared with ngli. A combination of three or four antibody tests including IgA anti-tissue transglutaminase and/or IgA anti- endomysium permitted diagnosis or exclusion of CD without intestinal biopsy in a high proportion of patients (78%). Jejunal biopsy would be necessary in patients with discordant antibody results (22%). With this two-step procedure, only patients with no CD-specific antibodies would be missed.
BACKGROUND: Tuberculosis (TB) in children is rarely confirmed due to the lack of effective diagnostic tools; only 10 to 15% of pediatric TB is smear positive due to paucibacillary samples and the difficulty of obtaining high-quality specimens from children. We evaluate here the accuracy of Xpert MTB/RIF in comparison with the Micoroscopic observation drug susceptibility (MODS) assay for diagnosis of TB in children using samples stored during a previously reported evaluation of the MODS assay METHODS: Ninety-six eligible children presenting with suspected TB were recruited consecutively at Pham Ngoc Thach Hospital in Ho Chi Minh City Viet Nam between May to December 2008 and tested by Ziehl-Neelsen smear, MODS and Mycobacterial growth Indicator (MGIT, Becton Dickinson) culture. All samples sent by the treating clinician for testing were included in the analysis. An aliquot of processed sample deposit was stored at -20 [degree sign]C and tested in the present study by Xpert MTB/RIF test. 183 samples from 73 children were available for analysis by Xpert. Accuracy measures of MODS and Xpert were summarized. RESULTS: The sensitivity (%) in detecting children with a clinical diagnosis of TB for smear, MODS and Xpert were 37.9 [95% CI 25.5; 51.6], 51.7 [38.2; 65.0] and 50.0 [36.6; 63.4], respectively (per patient analysis). Xpert was significantly more sensitive than smear (P=0.046). Testing of additional samples did not increase case detection for MODS while testing of a second sputum sample by Xpert detected only two additional cases. The positive and negative predictive values (%) of Xpert were 100.0 [88.0; 100.0] and 34.1 [20.5; 49.9], respectively, while those of MODS were 96.8 [83.3; 99.9] and 33.3 [19.6; 49.5]. CONCLUSION: MODS culture and Xpert MTB/RIF test have similar sensitivities for the detection of pediatric TB. Xpert MTB RIF is able to detect tuberculosis and rifampicin resistance within two hours. MODS allows isolation of cultures for further drug susceptibility testing but requires approximately one week to become positive. Testing of multiple samples by xpert detected only two additional cases and the benefits must be considered against costs in each setting. Further research is required to evaluate the optimal integration of Xpert into pediatric testing algorithms.
BACKGROUND: Healthcare claims databases have been used in several studies to characterize the risk and burden of chemotherapy-induced febrile neutropenia (FN) and effectiveness of colony-stimulating factors against FN. The accuracy of methods previously used to identify FN in such databases has not been formally evaluated. METHODS: Data comprised linked electronic medical records from Geisinger Health System and healthcare claims data from Geisinger Health Plan. Subjects were classified into subgroups based on whether or not they were hospitalized for FN per the presumptive “gold standard” (ANC <1.0x109/L, and body temperature >=38.30C or receipt of antibiotics) and claims-based definition (diagnosis codes for neutropenia, fever, and/or infection). Accuracy was evaluated principally based on positive predictive value (PPV) and sensitivity. RESULTS: Among 357 study subjects, 82 (23%) met the gold standard for hospitalized FN. For the claims-based definition including diagnosis codes for neutropenia plus fever in any position (n=28), PPV was 100% and sensitivity was 34% (95% CI: 24–45). For the definition including neutropenia in the primary position (n=54), PPV was 87% (78–95) and sensitivity was 57% (46–68). For the definition including neutropenia in any position (n=71), PPV was 77% (68–87) and sensitivity was 67% (56–77). CONCLUSIONS: Patients hospitalized for chemotherapy-induced FN can be identified in healthcare claims databases–with an acceptable level of mis-classification–using diagnosis codes for neutropenia, or neutropenia plus fever.
Bronchoscopy is frequently used for the evaluation of suspicious pulmonary lesions found on computed tomography, but its sensitivity for detecting lung cancer is limited. Recently, a bronchial genomic classifier was validated to improve the sensitivity of bronchoscopy for lung cancer detection, demonstrating a high sensitivity and negative predictive value among patients at intermediate risk (10-60 %) for lung cancer with an inconclusive bronchoscopy. Our objective for this study was to determine if a negative genomic classifier result that down-classifies a patient from intermediate risk to low risk (<10 %) for lung cancer would reduce the rate that physicians recommend more invasive testing among patients with an inconclusive bronchoscopy.
The accurate diagnosis of asbestos-related diseases is important because of past and current asbestos exposures. This study evaluated the reliability of clinical diagnoses of asbestos-related diseases in former mineworkers using autopsies as the reference standard. Sensitivity, specificity, positive predictive value and negative predictive value were calculated. The 149 cases identified had clinical examinations 0.3-7.4 years before death. More asbestos-related diseases were diagnosed at autopsy rather than clinically: 77 versus 52 for asbestosis, 27 versus 14 for mesothelioma and 22 versus 3 for lung cancer. Sensitivity and specificity values for clinical diagnoses were 50.6% and 81.9% for asbestosis, 40.7% and 97.5% for mesothelioma, and 13.6% and 100.0% for lung cancer. False-negative diagnoses of asbestosis were more likely using radiographs of acceptable (versus good) quality and in cases with pulmonary tuberculosis at autopsy. The low sensitivity values are indicative of the high proportion of false-negative diagnoses. It is unlikely that these were the result of disease manifestation between the last clinical assessment and autopsy. Where clinical features suggest asbestos-related diseases but the chest radiograph is negative, more sophisticated imaging techniques or immunohistochemistry for asbestos-related cancers should be used. Autopsies are useful for the detection of previously undiagnosed and misdiagnosed asbestos-related diseases, and for monitoring clinical practice and delivery of compensation.
Earlier detection of colorectal cancer greatly improves prognosis, largely through surgical excision of neoplastic polyps. These include benign adenomas which can transform over time to malignant adenocarcinomas. This progression may be associated with changes in full blood count indices. An existing risk algorithm derived in Israel stratifies individuals according to colorectal cancer risk using full blood count data, but has not been validated in the UK. We undertook a retrospective analysis using the Clinical Practice Research Datalink. Patients aged over 40 with full blood count data were risk-stratified and followed up for a diagnosis of colorectal cancer over a range of time intervals. The primary outcome was the area under the receiver operating characteristic curve for the 18-24-month interval. We also undertook a case-control analysis (matching for age, sex, and year of risk score), and a cohort study of patients undergoing full blood count testing during 2012, to estimate predictive values. We included 2,550,119 patients. The area under the curve for the 18-24-month interval was 0.776 [95% confidence interval (CI): 0.771, 0.781]. Performance improves as the time interval reduces. The area under the curve for the age-matched case-control analysis was 0.583 [0.574, 0.591]. For the population risk-scored in 2012, the positive predictive value at 99.5% specificity was 8.8% with negative predictive value 99.6%. The algorithm offers an additional means of identifying risk of colorectal cancer, and could support other approaches to early detection, including screening and active case finding.
Binary classifiers are routinely evaluated with performance measures such as sensitivity and specificity, and performance is frequently illustrated with Receiver Operating Characteristics (ROC) plots. Alternative measures such as positive predictive value (PPV) and the associated Precision/Recall (PRC) plots are used less frequently. Many bioinformatics studies develop and evaluate classifiers that are to be applied to strongly imbalanced datasets in which the number of negatives outweighs the number of positives significantly. While ROC plots are visually appealing and provide an overview of a classifier’s performance across a wide range of specificities, one can ask whether ROC plots could be misleading when applied in imbalanced classification scenarios. We show here that the visual interpretability of ROC plots in the context of imbalanced datasets can be deceptive with respect to conclusions about the reliability of classification performance, owing to an intuitive but wrong interpretation of specificity. PRC plots, on the other hand, can provide the viewer with an accurate prediction of future classification performance due to the fact that they evaluate the fraction of true positives among positive predictions. Our findings have potential implications for the interpretation of a large number of studies that use ROC plots on imbalanced datasets.
Background The ratio of soluble fms-like tyrosine kinase 1 (sFlt-1) to placental growth factor (PlGF) is elevated in pregnant women before the clinical onset of preeclampsia, but its predictive value in women with suspected preeclampsia is unclear. Methods We performed a prospective, multicenter, observational study to derive and validate a ratio of serum sFlt-1 to PlGF that would be predictive of the absence or presence of preeclampsia in the short term in women with singleton pregnancies in whom preeclampsia was suspected (24 weeks 0 days to 36 weeks 6 days of gestation). Primary objectives were to assess whether low sFlt-1:PlGF ratios (at or below a derived cutoff) predict the absence of preeclampsia within 1 week after the first visit and whether high ratios (above the cutoff) predict the presence of preeclampsia within 4 weeks. Results In the development cohort (500 women), we identified an sFlt-1:PlGF ratio cutoff of 38 as having important predictive value. In a subsequent validation study among an additional 550 women, an sFlt-1:PlGF ratio of 38 or lower had a negative predictive value (i.e., no preeclampsia in the subsequent week) of 99.3% (95% confidence interval [CI], 97.9 to 99.9), with 80.0% sensitivity (95% CI, 51.9 to 95.7) and 78.3% specificity (95% CI, 74.6 to 81.7). The positive predictive value of an sFlt-1:PlGF ratio above 38 for a diagnosis of preeclampsia within 4 weeks was 36.7% (95% CI, 28.4 to 45.7), with 66.2% sensitivity (95% CI, 54.0 to 77.0) and 83.1% specificity (95% CI, 79.4 to 86.3). Conclusions An sFlt-1:PlGF ratio of 38 or lower can be used to predict the short-term absence of preeclampsia in women in whom the syndrome is suspected clinically. (Funded by Roche Diagnostics.).
Assessing the value of the zebrafish conditioned place preference model for predicting human abuse potential
- The Journal of pharmacology and experimental therapeutics
- Published almost 4 years ago
Regulatory agencies recommend that centrally-active drugs are tested for abuse potential prior to approval. Standard preclinical assessments are conducted in rats or non-human primates (NHPs). This study evaluated the ability of the zebrafish conditioned place preference (CPP) model to predict human abuse outcomes. Twenty-seven compounds from a variety of pharmacological classes were tested in zebrafish CPP, categorized as positive or negative, and analysed using standard diagnostic tests of binary classification to determine the likelihood that zebrafish correctly predict robust positive signals in human subjective effects studies (+HSE) and/or DEA drug scheduling. Results were then compared with those generated for rat self-administration and CPP using this same set of compounds. The findings reveal that zebrafish concordance and sensitivity values were not significantly different from chance for both +HSE and scheduling. While significant improvements in specificity and negative predictive values were observed for zebrafish relative to +HSE, specificity without sensitivity provides limited value. Moreover, assessments in zebrafish provided no added value for predicting scheduling. By contrast, rat models generally possessed significantly improved concordance, sensitivity, and positive predictive values for both clinical measures. While there may be predictive value with compounds from specific pharmacological classes (e.g. µ-opioid receptor agonists, CNS stimulants) for zebrafish CPP, altogether these data highlight that using the current methodology, the zebrafish CPP model does not add value to the preclinical assessment of abuse potential.