De Winter and Happee  examined whether science based on selective publishing of significant results may be effective in accurate estimation of population effects, and whether this is even more effective than a science in which all results are published (i.e., a science without publication bias). Based on their simulation study they concluded that “selective publishing yields a more accurate meta-analytic estimation of the true effect than publishing everything, (and that) publishing nonreplicable results while placing null results in the file drawer can be beneficial for the scientific collective” (p.4).
Objective To examine the effect of surgeon sex on postoperative outcomes of patients undergoing common surgical procedures.Design Population based, retrospective, matched cohort study from 2007 to 2015.Setting Population based cohort of all patients treated in Ontario, Canada.Participants Patients undergoing one of 25 surgical procedures performed by a female surgeon were matched by patient age, patient sex, comorbidity, surgeon volume, surgeon age, and hospital to patients undergoing the same operation by a male surgeon.Interventions Sex of treating surgeon.Main outcome measure The primary outcome was a composite of death, readmission, and complications. We compared outcomes between groups using generalised estimating equations.Results 104 630 patients were treated by 3314 surgeons, 774 female and 2540 male. Before matching, patients treated by female doctors were more likely to be female and younger but had similar comorbidity, income, rurality, and year of surgery. After matching, the groups were comparable. Fewer patients treated by female surgeons died, were readmitted to hospital, or had complications within 30 days (5810 of 52 315, 11.1%, 95% confidence interval 10.9% to 11.4%) than those treated by male surgeons (6046 of 52 315, 11.6%, 11.3% to 11.8%; adjusted odds ratio 0.96, 0.92 to 0.99, P=0.02). Patients treated by female surgeons were less likely to die within 30 days (adjusted odds ratio 0.88; 0.79 to 0.99, P=0.04), but there was no significant difference in readmissions or complications. Stratified analyses by patient, physician, and hospital characteristics did not significant modify the effect of surgeon sex on outcome. A retrospective analysis showed no difference in outcomes by surgeon sex in patients who had emergency surgery, where patients do not usually choose their surgeon.Conclusions After accounting for patient, surgeon, and hospital characteristics, patients treated by female surgeons had a small but statistically significant decrease in 30 day mortality and similar surgical outcomes (length of stay, complications, and readmission), compared with those treated by male surgeons. These findings support the need for further examination of the surgical outcomes and mechanisms related to physicians and the underlying processes and patterns of care to improve mortality, complications, and readmissions for all patients.
In the USA, the relationship between the legal availability of guns and the firearm-related homicide rate has been debated. It has been argued that unrestricted gun availability promotes the occurrence of firearm-induced homicides. It has also been pointed out that gun possession can protect potential victims when attacked. This paper provides a first mathematical analysis of this tradeoff, with the goal to steer the debate towards arguing about assumptions, statistics, and scientific methods. The model is based on a set of clearly defined assumptions, which are supported by available statistical data, and is formulated axiomatically such that results do not depend on arbitrary mathematical expressions. According to this framework, two alternative scenarios can minimize the gun-related homicide rate: a ban of private firearms possession, or a policy allowing the general population to carry guns. Importantly, the model identifies the crucial parameters that determine which policy minimizes the death rate, and thus serves as a guide for the design of future epidemiological studies. The parameters that need to be measured include the fraction of offenders that illegally possess a gun, the degree of protection provided by gun ownership, and the fraction of the population who take up their right to own a gun and carry it when attacked. Limited data available in the literature were used to demonstrate how the model can be parameterized, and this preliminary analysis suggests that a ban of private firearm possession, or possibly a partial reduction in gun availability, might lower the rate of firearm-induced homicides. This, however, should not be seen as a policy recommendation, due to the limited data available to inform and parameterize the model. However, the model clearly defines what needs to be measured, and provides a basis for a scientific discussion about assumptions and data.
Machine Learning (ML) methods have been proposed in the academic literature as alternatives to statistical ones for time series forecasting. Yet, scant evidence is available about their relative performance in terms of accuracy and computational requirements. The purpose of this paper is to evaluate such performance across multiple forecasting horizons using a large subset of 1045 monthly time series used in the M3 Competition. After comparing the post-sample accuracy of popular ML methods with that of eight traditional statistical ones, we found that the former are dominated across both accuracy measures used and for all forecasting horizons examined. Moreover, we observed that their computational requirements are considerably greater than those of statistical methods. The paper discusses the results, explains why the accuracy of ML models is below that of statistical ones and proposes some possible ways forward. The empirical results found in our research stress the need for objective and unbiased ways to test the performance of forecasting methods that can be achieved through sizable and open competitions allowing meaningful comparisons and definite conclusions.
Received academic wisdom holds that human judgment is characterized by unrealistic optimism, the tendency to underestimate the likelihood of negative events and overestimate the likelihood of positive events. With recent questions being raised over the degree to which the majority of this research genuinely demonstrates optimism, attention to possible mechanisms generating such a bias becomes ever more important. New studies have now claimed that unrealistic optimism emerges as a result of biased belief updating with distinctive neural correlates in the brain. On a behavioral level, these studies suggest that, for negative events, desirable information is incorporated into personal risk estimates to a greater degree than undesirable information (resulting in a more optimistic outlook). However, using task analyses, simulations, and experiments we demonstrate that this pattern of results is a statistical artifact. In contrast with previous work, we examined participants' use of new information with reference to the normative, Bayesian standard. Simulations reveal the fundamental difficulties that would need to be overcome by any robust test of optimistic updating. No such test presently exists, so that the best one can presently do is perform analyses with a number of techniques, all of which have important weaknesses. Applying these analyses to five experiments shows no evidence of optimistic updating. These results clarify the difficulties involved in studying human ‘bias’ and cast additional doubt over the status of optimism as a fundamental characteristic of healthy cognition.
What are the statistical practices of articles published in journals with a high impact factor? Are there differences compared with articles published in journals with a somewhat lower impact factor that have adopted editorial policies to reduce the impact of limitations of Null Hypothesis Significance Testing? To investigate these questions, the current study analyzed all articles related to psychological, neuropsychological and medical issues, published in 2011 in four journals with high impact factors: Science, Nature, The New England Journal of Medicine and The Lancet, and three journals with relatively lower impact factors: Neuropsychology, Journal of Experimental Psychology-Applied and the American Journal of Public Health. Results show that Null Hypothesis Significance Testing without any use of confidence intervals, effect size, prospective power and model estimation, is the prevalent statistical practice used in articles published in Nature, 89%, followed by articles published in Science, 42%. By contrast, in all other journals, both with high and lower impact factors, most articles report confidence intervals and/or effect size measures. We interpreted these differences as consequences of the editorial policies adopted by the journal editors, which are probably the most effective means to improve the statistical practices in journals with high or low impact factors.
Reproducibility is a defining feature of science, but the extent to which it characterizes current research is unknown. We conducted replications of 100 experimental and correlational studies published in three psychology journals using high-powered designs and original materials when available. Replication effects were half the magnitude of original effects, representing a substantial decline. Ninety-seven percent of original studies had statistically significant results. Thirty-six percent of replications had statistically significant results; 47% of original effect sizes were in the 95% confidence interval of the replication effect size; 39% of effects were subjectively rated to have replicated the original result; and if no bias in original results is assumed, combining original and replication results left 68% with statistically significant effects. Correlational tests suggest that replication success was better predicted by the strength of original evidence than by characteristics of the original and replication teams.
Sole-source business models for genetic testing can create private databases containing information vital to interpreting the clinical significance of human genetic variations. But incomplete access to those databases threatens to impede the clinical interpretation of genomic medicine. National health systems and insurers, regulators, researchers, providers and patients all have a strong interest in ensuring broad access to information about the clinical significance of variants discovered through genetic testing. They can create incentives for sharing data and interpretive algorithms in several ways, including: promoting voluntary sharing; requiring laboratories to share as a condition of payment for or regulatory approval of laboratory services; establishing - and compelling participation in - resources that capture the information needed to interpret the data independent of company policies; and paying for sharing and interpretation in addition to paying for the test itself. US policies have failed to address the data-sharing issue. The entry of new and established firms into the European genetic testing market presents an opportunity to correct this failure.European Journal of Human Genetics advance online publication, 14 November 2012; doi:10.1038/ejhg.2012.217.
Public confidence in genetically modified (GM) crop studies is tenuous at best in many countries, including those of the European Union in particular. A lack of information about the effects of ties between academic research and industry might stretch this confidence to the breaking point. We therefore performed an analysis on a large set of research articles (n = 672) focusing on the efficacy or durability of GM Bt crops and ties between the researchers carrying out these studies and the GM crop industry. We found that ties between researchers and the GM crop industry were common, with 40% of the articles considered displaying conflicts of interest (COI). In particular, we found that, compared to the absence of COI, the presence of a COI was associated with a 50% higher frequency of outcomes favorable to the interests of the GM crop company. Using our large dataset, we were able to propose possible direct and indirect mechanisms behind this statistical association. They might notably include changes of authorship or funding statements after the results of a study have been obtained and a choice in the topics studied driven by industrial priorities.
Moderate-intensity exercise has attracted considerable attention because of its safety and many health benefits. Tai Chi, a form of mind-body exercise that originated in ancient China, has been gaining popularity. Practicing Tai Chi may improve overall health and well-being; however, to our knowledge, no study has evaluated its relationship with mortality. We assessed the associations of regular exercise and specifically participation in Tai Chi, walking, and jogging with total and cause-specific mortality among 61,477 Chinese men in the Shanghai Men’s Health Study (2002-2009). Information on exercise habits was obtained at baseline using a validated physical activity questionnaire. Deaths were ascertained through biennial home visits and linkage with a vital statistics registry. During a mean follow-up of 5.48 years, 2,421 deaths were identified. After adjustment for potential confounders, men who exercised regularly had a hazard ratio for total mortality of 0.80 (95% confidence interval: 0.74, 0.87) compared with men who did not exercise. The corresponding hazard ratios were 0.80 (95% confidence interval: 0.72, 0.89) for practicing Tai Chi, 0.77 (95% confidence interval: 0.69, 0.86) for walking, and 0.73 (95% confidence interval: 0.59, 0.90) for jogging. Similar inverse associations were also found for cancer and cardiovascular mortality. The present study provides the first evidence that, like walking and jogging, practicing Tai Chi is associated with reduced mortality.