SciCombinator

Discover the most talked about and latest scientific content & concepts.

Concept: Biostatistics

425

Machine Learning (ML) methods have been proposed in the academic literature as alternatives to statistical ones for time series forecasting. Yet, scant evidence is available about their relative performance in terms of accuracy and computational requirements. The purpose of this paper is to evaluate such performance across multiple forecasting horizons using a large subset of 1045 monthly time series used in the M3 Competition. After comparing the post-sample accuracy of popular ML methods with that of eight traditional statistical ones, we found that the former are dominated across both accuracy measures used and for all forecasting horizons examined. Moreover, we observed that their computational requirements are considerably greater than those of statistical methods. The paper discusses the results, explains why the accuracy of ML models is below that of statistical ones and proposes some possible ways forward. The empirical results found in our research stress the need for objective and unbiased ways to test the performance of forecasting methods that can be achieved through sizable and open competitions allowing meaningful comparisons and definite conclusions.

Concepts: Scientific method, Statistics, Mathematics, Machine learning, Time series, Biostatistics, Applied mathematics, Formal sciences

298

Several studies in the new field of cognitive epidemiology have shown that higher intelligence predicts longer lifespan. This positive correlation might arise from socioeconomic status influencing both intelligence and health; intelligence leading to better health behaviours; and/or some shared genetic factors influencing both intelligence and health. Distinguishing among these hypotheses is crucial for medicine and public health, but can only be accomplished by studying a genetically informative sample.

Concepts: Health care, Public health, Genetics, Health, Epidemiology, Biology, Population health, Biostatistics

171

We have developed Cake, a bioinformatics software pipeline that integrates four publicly available somatic variant-calling algorithms to identify single nucleotide variants with higher sensitivity and accuracy than any one algorithm alone. Cake can be run on a high-performance computer cluster or used as a standalone application.

Concepts: DNA, Algorithm, Bioinformatics, Computer, Computer program, Computer science, Biostatistics

169

BACKGROUND: There is an increasing need for processing and understanding relevant information generated by the systematic collection of public health data over time. However, the analysis of those time series usually requires advanced modeling techniques, which are not necessarily mastered by staff, technicians and researchers working on public health and epidemiology. Here a user-friendly tool, EPIPOI, is presented that facilitates the exploration and extraction of parameters describing trends, seasonality and anomalies that characterize epidemiological processes. It also enables the inspection of those parameters across geographic regions. Although the visual exploration and extraction of relevant parameters from time series data is crucial in epidemiological research, until now it had been largely restricted to specialists. METHODS: EPIPOI is freely available software developed in Matlab (The Mathworks Inc) that runs both on PC and Mac computers. Its friendly interface guides users intuitively through useful comparative analyses including the comparison of spatial patterns in temporal parameters. RESULTS: EPIPOI is able to handle complex analyses in an accessible way. A prototype has already been used to assist researchers in a variety of contexts from didactic use in public health workshops to the main analytical tool in published research. CONCLUSIONS: EPIPOI can assist public health officials and students to explore time series data using a broad range of sophisticated analytical and visualization tools. It also provides an analytical environment where even advanced users can benefit by enabling a higher degree of control over model assumptions, such as those associated with detecting disease outbreaks and pandemics.

Concepts: Public health, Health, Epidemiology, Time series, Biostatistics, Outbreak, The MathWorks, MATLAB

167

Background:Tibiotalocalcaneal (TTC) arthrodesis using a nail has been shown to be an effective salvage technique; however, there is a risk of major amputation. A better understanding of the relative risk of amputation after TTC fusion and the factors that influence this could help the preoperative consultation and guide discussion on the economics of limb salvage.Methods:One hundred seventy-nine limbs were treated with TTC fusion with an intramedullary nail. A comprehensive chart and radiographic review was pulled from our intramedullary nail database. Patients were divided into those who went on to eventual amputation and those with successful salvage of their limb. Variables from the database were used to build a statistical model with a biostatistician. Final results were presented, and a formula to determine probability of amputation was created.Results:There were 21 limbs that were eventually treated with major amputation. This represents an overall salvage rate of 88.2% (158/179 patients). Age was a factor in amputation risk, and the highest risk factor for amputation was diabetes with an odds ratio of 7.01 and 95% confidence, P = .0019. The odds of amputation were 6.2 times and 3 times greater for patients undergoing revisions and those with preoperative ulcers, respectively. The probability of amputation could be found preoperatively by using the derived equation: e(x)/(1 + e(x)) where x is a factor of age, diabetes, revision, and ulceration.Conclusion:TTC arthrodesis with a retrograde intramedullary nail has a high rate of limb salvage across a wide range of indications and medical comorbidities. In this patient cohort, diabetes was the most notable risk for amputation, followed by revision surgery, preoperative ulceration, and age. A model has been built to help predict the risk of amputation.Level of Evidence:Level II, prognostic.

Concepts: Epidemiology, Statistics, Medical statistics, Risk, Surgery, Relative risk, Odds ratio, Biostatistics

142

INTRODUCTION    There are no widely accepted standards of diagnosis of sarcoidosis.  OBJECTIVES    The aim of the study was to assess the relative diagnostic yield of endobronchial ultrasound needle aspiration (EBUS-NA) and endoscopic ultrasound needle aspiration (EUS-NA), and to compare them with the standard diagnostic techniques, i.e. endobronchial biopsy (EBB), transbronchial lung biopsy (TBLB), transbronchial needle aspiration (TBNA) and mediastinoscopy.  PATIENTS AND METHODS    A prospective randomized study including consecutive patients with clinical diagnosis of stage I or II sarcoidosis. In all patients EBB, TBLB and TBNA were performed initially. Subsequently, patients were randomized to group A (EBUS-NA) or group B (EUS-NA). Next, a crossover control test was performed: all patients with negative results in group A underwent EUS- NA and all patients with negative results in the group B underwent EBUS-NA. In case of lack of confirmation of sarcoidosis, mediastinoscopy was performed. RESULTS    There were 106 patients enrolled, and 100 were available for the final analysis. Overall sensitivity and accuracy of standard endoscopic methods were both 64%. When analyzing each of the standard endoscopic methods separately, diagnosis was confirmed with EBB in 12 patients (12%), TBLB in 42 patients (42%) and TBNA in 44 patients (44%). The accuracy and sensitivity of each endosonography technique was statistically significantly higher than that of EBB+TBLB+TBNA (P = 0.0112 and 0.0134).  CONCLUSIONS    Sensitivity and accuracy of EBUS-NA and EUS-NA are significantly higher than the standard endoscopic methods (P <0.01). Sensitivity and accuracy of EUS-NA is higher than EBUS-NA, but the difference is not statistically significant.

Concepts: Statistics, Biopsy, Pathology, Type I and type II errors, Medical tests, Standardization, Biostatistics, Statistical theory

115

Earlier detection of colorectal cancer greatly improves prognosis, largely through surgical excision of neoplastic polyps. These include benign adenomas which can transform over time to malignant adenocarcinomas. This progression may be associated with changes in full blood count indices. An existing risk algorithm derived in Israel stratifies individuals according to colorectal cancer risk using full blood count data, but has not been validated in the UK. We undertook a retrospective analysis using the Clinical Practice Research Datalink. Patients aged over 40 with full blood count data were risk-stratified and followed up for a diagnosis of colorectal cancer over a range of time intervals. The primary outcome was the area under the receiver operating characteristic curve for the 18-24-month interval. We also undertook a case-control analysis (matching for age, sex, and year of risk score), and a cohort study of patients undergoing full blood count testing during 2012, to estimate predictive values. We included 2,550,119 patients. The area under the curve for the 18-24-month interval was 0.776 [95% confidence interval (CI): 0.771, 0.781]. Performance improves as the time interval reduces. The area under the curve for the age-matched case-control analysis was 0.583 [0.574, 0.591]. For the population risk-scored in 2012, the positive predictive value at 99.5% specificity was 8.8% with negative predictive value 99.6%. The algorithm offers an additional means of identifying risk of colorectal cancer, and could support other approaches to early detection, including screening and active case finding.

Concepts: Cancer, Positive predictive value, Negative predictive value, Type I and type II errors, Colorectal cancer, Benign tumor, Receiver operating characteristic, Biostatistics

86

To investigate the impact of smoking and smoking cessation on cardiovascular mortality, acute coronary events, and stroke events in people aged 60 and older, and to calculate and report risk advancement periods for cardiovascular mortality in addition to traditional epidemiological relative risk measures.

Concepts: Cohort study, Epidemiology, Atherosclerosis, Medical statistics, Actuarial science, Relative risk, Smoking cessation, Biostatistics

70

U.S. medical students have staged “white coat die-ins” in support of the #BlackLivesMatter movement, but should the medical community do more? Should health professionals be accountable for fighting the racism that contributes to poor health in the first place?

Concepts: Health care, Medicine, Universal health care, Health, Race, Population health, Health science, Biostatistics

61

Numerous studies demonstrating that statistical errors are common in basic science publications have led to calls to improve statistical training for basic scientists. In this article, we sought to evaluate statistical requirements for PhD training and to identify opportunities for improving biostatistics education in the basic sciences. We provide recommendations for improving statistics training for basic biomedical scientists, including: 1. Encouraging departments to require statistics training, 2. Tailoring coursework to the students' fields of research, and 3. Developing tools and strategies to promote education and dissemination of statistical knowledge. We also provide a list of statistical considerations that should be addressed in statistics education for basic scientists.

Concepts: Scientific method, Regression analysis, Statistics, Mathematics, Type I and type II errors, Science, Biostatistics, Baseball statistics