Discover the most talked about and latest scientific content & concepts.

Journal: Statistics in medicine


When estimating the average effect of a binary treatment (or exposure) on an outcome, methods that incorporate propensity scores, the G-formula, or targeted maximum likelihood estimation (TMLE) are preferred over naïve regression approaches, which are biased under misspecification of a parametric outcome model. In contrast propensity score methods require the correct specification of an exposure model. Double-robust methods only require correct specification of either the outcome or the exposure model. Targeted maximum likelihood estimation is a semiparametric double-robust method that improves the chances of correct model specification by allowing for flexible estimation using (nonparametric) machine-learning methods. It therefore requires weaker assumptions than its competitors. We provide a step-by-step guided implementation of TMLE and illustrate it in a realistic scenario based on cancer epidemiology where assumptions about correct model specification and positivity (ie, when a study participant had 0 probability of receiving the treatment) are nearly violated. This article provides a concise and reproducible educational introduction to TMLE for a binary outcome and exposure. The reader should gain sufficient understanding of TMLE from this introductory tutorial to be able to apply the method in practice. Extensive R-code is provided in easy-to-read boxes throughout the article for replicability. Stata users will find a testing implementation of TMLE and additional material in the Appendix S1 and at the following GitHub repository:

Concepts: Scientific method, Estimation theory, Maximum likelihood, Propensity score, Likelihood function, Propensity score matching, Fisher information, Method of moments


Much progress has been made over the past decade with the development of novel methods for addressing increasingly more complex multiplicity problems arising in confirmatory Phase III clinical trials. This includes traditional problems with a single source of multiplicity, for example, analysis of multiple endpoints or dose-placebo contrasts. In addition, more advanced problems with several sources of multiplicity have attracted attention in clinical drug development. These problems include two or more families of objectives such as multiple endpoints evaluated at multiple dose levels or in multiple patient populations. This paper provides a review of concepts that play a central role in defining and solving multiplicity problems (error rate definitions) and introduces main classes of multiple testing procedures widely used in clinical trials (nonparametric, semiparametric, and parametric procedures). The paper also presents recent advances in multiplicity research, including gatekeeping procedures for clinical trials with multiple sets of objectives. The concepts and methods introduced in the paper are illustrated using several case studies on the basis of real clinical trials. Software implementation of commonly used multiple testing and gatekeeping procedures is discussed. Copyright © 2012 John Wiley & Sons, Ltd.

Concepts: Pharmacology, Clinical trial, Pharmaceutical industry, Drug discovery, Drug development, John Wiley & Sons, Michael Keaton


Growing interest in personalised medicine and targeted therapies is leading to an increase in the importance of subgroup analyses. If it is planned to view treatment comparisons in both a predefined subgroup and the full population as co-primary analyses, it is important that the statistical analysis controls the familywise type I error rate. Spiessens and Debois (Cont. Clin. Trials, 2010, 31, 647-656) recently proposed an approach specific for this setting, which incorporates an assumption about the correlation based on the known sizes of the different groups, and showed that this is more powerful than generic multiple comparisons procedures such as the Bonferroni correction. If recruitment is slow relative to the length of time taken to observe the outcome, it may be efficient to conduct an interim analysis. In this paper, we propose a new method for an adaptive clinical trial with co-primary analyses in a predefined subgroup and the full population based on the conditional error function principle. The methodology is generic in that we assume test statistics can be taken to be normally distributed rather than making any specific distributional assumptions about individual patient data. In a simulation study, we demonstrate that the new method is more powerful than previously suggested analysis strategies. Furthermore, we show how the method can be extended to situations when the selection is not based on the final but on an early outcome. We use a case study in a targeted therapy in oncology to illustrate the use of the proposed methodology with non-normal outcomes. Copyright © 2012 John Wiley & Sons, Ltd.

Concepts: Scientific method, Clinical trial, Statistics, Evaluation methods, Normal distribution, Multiple comparisons, Bonferroni correction, Familywise error rate


In this paper, we propose a class of multivariate random effects models allowing for the inclusion of study-level covariates to carry out meta-analyses. As existing algorithms for computing maximum likelihood estimates often converge poorly or may not converge at all when the random effects are multi-dimensional, we develop an efficient expectation-maximization algorithm for fitting multi-dimensional random effects regression models. In addition, we also develop a new methodology for carrying out variable selection with study-level covariates. We examine the performance of the proposed methodology via a simulation study. We apply the proposed methodology to analyze metadata from 26 studies involving statins as a monotherapy and in combination with ezetimibe. In particular, we compare the low-density lipoprotein cholesterol-lowering efficacy of monotherapy and combination therapy on two patient populations (naïve and non-naïve patients to statin monotherapy at baseline), controlling for aggregate covariates. The proposed methodology is quite general and can be applied in any meta-analysis setting for a wide range of scientific applications and therefore offers new analytic methods of clinical importance. Copyright © 2012 John Wiley & Sons, Ltd.

Concepts: Estimation theory, Atherosclerosis, Statin, Niacin, Mevalonate pathway, Maximum likelihood, Machine learning, Ezetimibe


The goal of mediation analysis is to identify and explicate the mechanism that underlies a relationship between a risk factor and an outcome via an intermediate variable (mediator). In this paper, we consider the estimation of mediation effects in zero-inflated (ZI) models intended to accommodate ‘extra’ zeros in count data. Focusing on the ZI negative binomial models, we provide a mediation formula approach to estimate the (overall) mediation effect in the standard two-stage mediation framework under a key sequential ignorability assumption. We also consider a novel decomposition of the overall mediation effect for the ZI context using a three-stage mediation model. Estimation of the components of the overall mediation effect requires an assumption involving the joint distribution of two counterfactuals. Simulation study results demonstrate low bias of mediation effect estimators and close-to-nominal coverage probability of confidence intervals. We also modify the mediation formula method by replacing ‘exact’ integration with a Monte Carlo integration method. The method is applied to a cohort study of dental caries in very low birth weight adolescents. For overall mediation effect estimation, sensitivity analysis was conducted to quantify the degree to which key assumption must be violated to reverse the original conclusion. Copyright © 2012 John Wiley & Sons, Ltd.

Concepts: Statistics, Estimator, Monte Carlo, Normal distribution, Dental caries, Estimation, Statistical inference, Binomial proportion confidence interval


Common problems to many longitudinal HIV/AIDS, cancer, vaccine, and environmental exposure studies are the presence of a lower limit of quantification of an outcome with skewness and time-varying covariates with measurement errors. There has been relatively little work published simultaneously dealing with these features of longitudinal data. In particular, left-censored data falling below a limit of detection may sometimes have a proportion larger than expected under a usually assumed log-normal distribution. In such cases, alternative models, which can account for a high proportion of censored data, should be considered. In this article, we present an extension of the Tobit model that incorporates a mixture of true undetectable observations and those values from a skew-normal distribution for an outcome with possible left censoring and skewness, and covariates with substantial measurement error. To quantify the covariate process, we offer a flexible nonparametric mixed-effects model within the Tobit framework. A Bayesian modeling approach is used to assess the simultaneous impact of left censoring, skewness, and measurement error in covariates on inference. The proposed methods are illustrated using real data from an AIDS clinical study. Copyright © 2013 John Wiley & Sons, Ltd.

Concepts: Regression analysis, Measurement, Error, Model, Normal distribution, Covariate, Observational error, Tobit model


In spatiotemporal analysis, the effect of a covariate on the outcome usually varies across areas and time. The spatial configuration of the areas may potentially depend on not only the structured random intercept but also spatially varying coefficients of covariates. In addition, the normality assumption of the distribution of spatially varying coefficients could lead to potential biases of estimations. In this article, we proposed a Bayesian semiparametric space-time model where the spatially-temporally varying coefficient is decomposed as fixed, spatially varying, and temporally varying coefficients. We nonparametrically modeled the spatially varying coefficients of space-time covariates by using the area-specific Dirichlet process prior with weights transformed via a generalized transformation. We modeled the temporally varying coefficients of covariates through the dynamic model. We also took into account the uncertainty of inclusion of the spatially-temporally varying coefficients by variable selection procedure through determining the probabilities of different effects for each covariate. The proposed semiparametric approach shows its improvement compared with the Bayesian spatial-temporal models with normality assumption on spatial random effects and the Bayesian model with the Dirichlet process prior on the random intercept. We presented a simulation example to evaluate the performance of the proposed approach with the competing models. We used an application to low birth weight data in South Carolina as an illustration. Copyright © 2013 John Wiley & Sons, Ltd.

Concepts: Time, Mathematics, Physics, Probability theory, Polynomial, Universe, Statistical theory, Statistical models


Previous authors have proposed the sequential parallel comparison design (SPCD) to address the issue of high placebo response rate in clinical trials. The original use of SPCD focused on binary outcomes, but recent use has since been extended to continuous outcomes that arise more naturally in many fields, including psychiatry. Analytic methods proposed to date for analysis of SPCD trial continuous data included methods based on seemingly unrelated regression and ordinary least squares. Here, we propose a repeated measures linear model that uses all outcome data collected in the trial and accounts for data that are missing at random. An appropriate contrast formulated after the model has been fit can be used to test the primary hypothesis of no difference in treatment effects between study arms. Our extensive simulations show that when compared with the other methods, our approach preserves the type I error even for small sample sizes and offers adequate power and the smallest mean squared error under a wide variety of assumptions. We recommend consideration of our approach for analysis of data coming from SPCD trials. Copyright © 2013 John Wiley & Sons, Ltd.

Concepts: Regression analysis, Linear regression, Clinical trial, Statistics, Sample size, Placebo, The Trial, Seemingly unrelated regression


The “MeToo#” movement has been instrumental in delineating the prevalence of alleged sexual harassment complaints in the workplace. In this article, we propose controlled scientific methods for statisticians and credibility assessment experts to jointly collaborate with human resource staff and/or attorneys to help evaluate claims by a class of accusers against an alleged serial harasser. When an accused falsely denies claims as lies, s/he may be guilty of libel/defamation. Hence, even if statutes of limitations for criminal prosecution may have expired, a timely civil suit could be mounted. It is critically important that these claims be scientifically evaluated to protect the accused from a conspiracy. Using a properly monitored controlled study and the latest credibility assessment methods, it is clear that even with a small number of accusers, a high-powered study is feasible to contribute civil level evidence for or against the accusations. However, if there are false accusers in the class, power would be greatly diminished, making exoneration a likely outcome. We illustrate a hypothetical example, where six honest accusers against an alleged serial harasser, X, and 16 controls, have 95% power at an exact one-sided P-value of 0.03, using Barnard’s test (vastly superior to Fisher’s exact conditional test).


Demonstrating bioequivalence of several pharmacokinetic (PK) parameters, such as AUC and Cmax , that are calculated from the same biological sample measurements is in fact a multivariate problem, even though this is neglected by most practitioners and regulatory bodies, who typically settle for separate univariate analyses. We believe, however, that a truly multivariate evaluation of all PK measures simultaneously is clearly more adequate. In this paper, we review methods to construct joint confidence regions around multivariate normal means and investigate their usefulness in simultaneous bioequivalence problems via simulation. Some of them work well for idealised scenarios but break down when faced with real-data challenges such as unknown variance and correlation among the PK parameters. We study the shapes of the confidence regions resulting from different methods, discuss how marginal simultaneous confidence intervals for the individual PK measures can be derived, and illustrate the application to data from a trial on ticlopidine hydrochloride. An R package is available.

Concepts: Statistics, Sample size, Confidence interval, Normal distribution, Prediction interval, Student's t-distribution, Univariate, Confidence region