In this paper, we summarize “The Prevention and Treatment of Missing Data in Clinical Trials,” a report by the National Research Council, with a focus on prevention. Specifically, we explain the following topics: the concept of the estimand, which defines what to be estimated in clinical trials; trial designs to minimize dropouts; continuing data collection for dropouts, and trial strategies for trial sponsors, investigators, and site personnel to reduce the frequency of missing data.
The presence of missing data has seriously compromised statistical inferences from clinical trials. Mixed-effects models for repeated measures (MMRM) provide a useful approach for analyzing incomplete longitudinal data. In particular, MMRM analysis is increasingly common in biomedical research and is frequently used as the primary analysis in these trials. We introduce the basic ideas, model structure, and properties of MMRM. In addition, we give an overview of parameter estimates and statistical inference for MMRM analysis. Finally, we discuss some important considerations regarding MMRM analysis.
In most observational and experimental studies, missing data certainly happens and adequate treatments are required to prevent bias and loss of efficiency of the statistical inference. However, the missing generally occurs in multiple variables with different patterns in individual subjects. Although valid statistical inference methods are needed in these situations, most existing methods require complicated statistical models and computations. The multiple imputation by chained equation (MICE) is an effective method that can be applied in these situations, and has been widely used in many observational and experimental studies in recent years. Also, many useful statistical packages have been developed for standard statistical software recently. In this article, we provide a gentle tutorial on the MICE methodology with concrete applications to an ovarian cancer clinical study (Clark and Altman, 2003; J. Clin. Epidemiol. 56, 28-37).
Missing data problems are common in medical and epidemiologic studies. If there are systematic differences between responders and non-responders, we need to select an appropriate method in analyzing the data to handle the missing data problem. When MAR assumption is valid, methods based on observed data likelihood or multiple imputation method are often applied in practice. These methods are categorized as parametric models, which can suffer potential sensitivity to deviations of assumed model from the true model. A natural alternative option for this problem is to take semiparametric approach. However, semiparametric methods for incomplete data are less popular in practice, partly because their complexity. In this paper, we explain the methodological issues on semiparametric inference based on incomplete data, focusing on a simple pretest-posttest study scenario to give greater importance to the practical aspect of its application.
This paper gives an overview of statistical methods used under the Missing Not At Random (MNAR) assumption, with a particular focus on those utilized in drug development. As the missing data mechanism is generally not testable from the observed data, even when we adopt the Missing At Random (MAR) assumption in the primary analysis, we cannot rule out the possibility of the true mechanism being MNAR. Furthermore, the effect of model misspecification may be greater for analyses under the MNAR assumption than those under MAR. Thus careful consideration is needed before analysis under MNAR is specified as the primary analysis. In this paper, we review statistical models and parameter estimation methods under the MNAR assumption from the viewpoint of both primary analysis and sensitivity analysis. For the statistical models, the Selection Model (SM), Shared Parameter Model (SPM), and Pattern-Mixture Model (PMM) are introduced; for the parameter estimation, methods based on maximization of the observed likelihood and those based on Multiple Imputation (MI) techniques are reviewed. In addition to the theoretical descriptions, we also introduce selected references to procedures and publicly available macro programs for the statistical software SAS.