Concept: Effect size
A focus on novel, confirmatory, and statistically significant results leads to substantial bias in the scientific literature. One type of bias, known as “p-hacking,” occurs when researchers collect or select data or statistical analyses until nonsignificant results become significant. Here, we use text-mining to demonstrate that p-hacking is widespread throughout science. We then illustrate how one can test for p-hacking when performing a meta-analysis and show that, while p-hacking is probably common, its effect seems to be weak relative to the real effect sizes being measured. This result suggests that p-hacking probably does not drastically alter scientific consensuses drawn from meta-analyses.
This randomized controlled trial was performed to investigate whether placebo effects in chronic low back pain could be harnessed ethically by adding open-label placebo (OLP) treatment to treatment as usual (TAU) for 3 weeks. Pain severity was assessed on three 0- to 10-point Numeric Rating Scales, scoring maximum pain, minimum pain, and usual pain, and a composite, primary outcome, total pain score. Our other primary outcome was back-related dysfunction, assessed on the Roland-Morris Disability Questionnaire. In an exploratory follow-up, participants on TAU received placebo pills for 3 additional weeks. We randomized 97 adults reporting persistent low back pain for more than 3 months' duration and diagnosed by a board-certified pain specialist. Eighty-three adults completed the trial. Compared to TAU, OLP elicited greater pain reduction on each of the three 0- to 10-point Numeric Rating Scales and on the 0- to 10-point composite pain scale (P < 0.001), with moderate to large effect sizes. Pain reduction on the composite Numeric Rating Scales was 1.5 (95% confidence interval: 1.0-2.0) in the OLP group and 0.2 (-0.3 to 0.8) in the TAU group. Open-label placebo treatment also reduced disability compared to TAU (P < 0.001), with a large effect size. Improvement in disability scores was 2.9 (1.7-4.0) in the OLP group and 0.0 (-1.1 to 1.2) in the TAU group. After being switched to OLP, the TAU group showed significant reductions in both pain (1.5, 0.8-2.3) and disability (3.4, 2.2-4.5). Our findings suggest that OLP pills presented in a positive context may be helpful in chronic low back pain.This is an open-access article distributed under the terms of the Creative Commons Attribution-Non Commercial-No Derivatives License 4.0 (CCBY-NC-ND), where it is permissible to download and share the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal.
What are the statistical practices of articles published in journals with a high impact factor? Are there differences compared with articles published in journals with a somewhat lower impact factor that have adopted editorial policies to reduce the impact of limitations of Null Hypothesis Significance Testing? To investigate these questions, the current study analyzed all articles related to psychological, neuropsychological and medical issues, published in 2011 in four journals with high impact factors: Science, Nature, The New England Journal of Medicine and The Lancet, and three journals with relatively lower impact factors: Neuropsychology, Journal of Experimental Psychology-Applied and the American Journal of Public Health. Results show that Null Hypothesis Significance Testing without any use of confidence intervals, effect size, prospective power and model estimation, is the prevalent statistical practice used in articles published in Nature, 89%, followed by articles published in Science, 42%. By contrast, in all other journals, both with high and lower impact factors, most articles report confidence intervals and/or effect size measures. We interpreted these differences as consequences of the editorial policies adopted by the journal editors, which are probably the most effective means to improve the statistical practices in journals with high or low impact factors.
Much has been written regarding p-values below certain thresholds (most notably 0.05) denoting statistical significance and the tendency of such p-values to be more readily publishable in peer-reviewed journals. Intuition suggests that there may be a tendency to manipulate statistical analyses to push a “near significant p-value” to a level that is considered significant. This article presents a method for detecting the presence of such manipulation (herein called “fiddling”) in a distribution of p-values from independent studies. Simulations are used to illustrate the properties of the method. The results suggest that the method has low type I error and that power approaches acceptable levels as the number of p-values being studied approaches 1000.
Reproducibility is a defining feature of science, but the extent to which it characterizes current research is unknown. We conducted replications of 100 experimental and correlational studies published in three psychology journals using high-powered designs and original materials when available. Replication effects were half the magnitude of original effects, representing a substantial decline. Ninety-seven percent of original studies had statistically significant results. Thirty-six percent of replications had statistically significant results; 47% of original effect sizes were in the 95% confidence interval of the replication effect size; 39% of effects were subjectively rated to have replicated the original result; and if no bias in original results is assumed, combining original and replication results left 68% with statistically significant effects. Correlational tests suggest that replication success was better predicted by the strength of original evidence than by characteristics of the original and replication teams.
Here we show that constructal-law physics unifies the design of animate and inanimate movement by requiring that larger bodies move farther, and their movement on the landscape last longer. The life span of mammals must scale as the body mass (M) raised to the power ¼, and the distance traveled during the lifetime must increase with body size. The same size effect on life span and distance traveled holds for the other flows that move mass on earth: atmospheric and oceanic jets and plumes, river basins, animals and human operated vehicles. The physics is the same for all flow systems on the landscape: the scaling rules of “design” are expressions of the natural tendency of all flow systems to generate designs that facilitate flow access. This natural tendency is the constructal law of design and evolution in nature. Larger bodies are more efficient movers of mass on the landscape.
BACKGROUND: To evaluate the effect of lifestyle modifications on metabolic syndrome (MetS) as assessed by its resolution and improved values for its components. METHODS: This was a systematic review and meta-analysis. Searches were performed of MEDLINE and the Cochrane Database from January 1966 to October 2011 to identify randomized controlled trials (RCTs) related to the study objective. The included studies were RCTs restricted to the English language, with a follow-up period of 6 months or more, which reported overall resolution of MetS or values of MetS components (fasting blood glucose, waist circumference, high-density lipoprotein (HDL), triglycerides, and systolic and diastolic blood pressure (SBP, DBP)) . Two investigators independently assessed study eligibility. The effect sizes were the relative proportion of patients with resolved MetS and mean differences in MetS component values from baseline to 1-year follow-up in a lifestyle-modification intervention (LMI) group versus a control (conventional lifestyle education or no treatment) group. Meta-analyses were conducted using a random-effects model. RESULTS: Eleven interventions in eight RCTs were used for the meta-analyses. The relative proportion of patients with resolved MetS in the intervention group was approximately 2.0 (95% CI 1.5 to 2.7) times greater in the intervention group compared with the control group (7 interventions, n = 2.839). LMI (5 interventions, n = 748) significantly reduced mean values for SBP by 6.4 mmHg (95% CI 9.7 to 3.2), DBP by 3.3 mmHg (95% CI 5.2 to 1.4), triglycerides by 12.0 mg/dl ( 95% CI 22.2 to 1.7), waist circumference by 2.7 cm (95% CI 4.6 to 0.9), and fasting blood glucose by 11.5 mg/dl (95% CI 22.4 to 0.6) (5 interventions), but reductions were not significant for HDL (1.3 mg/dl; 95% CI 0.6 to 3.1). CONCLUSIONS: The LMI was effective in resolving MetS and reducing the severity of related abnormalities (fasting blood glucose, waist circumference, SBP and DBP, and triglycerides) in subjects with MetS.
Industry-sponsored clinical drug studies are associated with publication of outcomes that favor the sponsor, even when controlling for potential bias in the methods used. However, the influence of sponsorship bias has not been examined in preclinical animal studies. We performed a meta-analysis of preclinical statin studies to determine whether industry sponsorship is associated with either increased effect sizes of efficacy outcomes and/or risks of bias in a cohort of published preclinical statin studies. We searched Medline (January 1966-April 2012) and identified 63 studies evaluating the effects of statins on atherosclerosis outcomes in animals. Two coders independently extracted study design criteria aimed at reducing bias, results for all relevant outcomes, sponsorship source, and investigator financial ties. The I(2) statistic was used to examine heterogeneity. We calculated the standardized mean difference (SMD) for each outcome and pooled data across studies to estimate the pooled average SMD using random effects models. In a priori subgroup analyses, we assessed statin efficacy by outcome measured, sponsorship source, presence or absence of financial conflict information, use of an optimal time window for outcome assessment, accounting for all animals, inclusion criteria, blinding, and randomization. The effect of statins was significantly larger for studies sponsored by nonindustry sources (-1.99; 95% CI -2.68, -1.31) versus studies sponsored by industry (-0.73; 95% CI -1.00, -0.47) (p value<0.001). Statin efficacy did not differ by disclosure of financial conflict information, use of an optimal time window for outcome assessment, accounting for all animals, inclusion criteria, blinding, and randomization. Possible reasons for the differences between nonindustry- and industry-sponsored studies, such as selective reporting of outcomes, require further study.
Do interventions to promote walking in groups increase physical activity? A systematic literature review with meta-analysis
- The international journal of behavioral nutrition and physical activity
- Published almost 5 years ago
OBJECTIVE: Walking groups are increasingly being set up but little is known about their efficacy in promoting physical activity. The present study aims to assess the efficacy of interventions to promote walking in groups to promoting physical activity within adults, and to explore potential moderators of this efficacy. METHOD: Systematic literature review searches were conducted using multiple databases. A random effect model was used for the meta-analysis, with sensitivity analysis. RESULTS: The effect of the interventions (19 studies, 4 572 participants) on physical activity was of medium size (d = 0.52), statistically significant (95%CI 0.32 to 0.71, p < 0.0001), and with large fail-safe of N = 753. Moderator analyses showed that lower quality studies had larger effect sizes than higher quality studies, studies reporting outcomes over six months had larger effect sizes than studies reporting outcomes up to six months, studies that targeted both genders had higher effect sizes than studies that targeted only women, studies that targeted older adults had larger effect sizes than studies that targeted younger adults. No significant differences were found between studies delivered by professionals and those delivered by lay people. CONCLUSION: Interventions to promote walking in groups are efficacious at increasing physical activity. Despite low homogeneity of results, and limitations (e.g. small number of studies using objective measures of physical activity, publication bias), which might have influence the findings, the large fail-safe N suggests these findings are robust. Possible explanations for heterogeneity between studies are discussed, and the need for more investigation of this is highlighted.
We have empirically assessed the distribution of published effect sizes and estimated power by analyzing 26,841 statistical records from 3,801 cognitive neuroscience and psychology papers published recently. The reported median effect size was D = 0.93 (interquartile range: 0.64-1.46) for nominally statistically significant results and D = 0.24 (0.11-0.42) for nonsignificant results. Median power to detect small, medium, and large effects was 0.12, 0.44, and 0.73, reflecting no improvement through the past half-century. This is so because sample sizes have remained small. Assuming similar true effect sizes in both disciplines, power was lower in cognitive neuroscience than in psychology. Journal impact factors negatively correlated with power. Assuming a realistic range of prior probabilities for null hypotheses, false report probability is likely to exceed 50% for the whole literature. In light of our findings, the recently reported low replication success in psychology is realistic, and worse performance may be expected for cognitive neuroscience.