Abstract
The two-stage case control study is a common means for reducing the cost of covariate measurements in epidemiologic studies. Under this design, complete covariate data are collected only on randomly sampled cases and controls in the second stage. In many applications, certain covariates are readily measured on all of the first stage samples, and surrogate measurements of the expensive covariates also may be available. Using the covariate data collected outside the second stage samples, the relative risk estimators can be substantially improved. In this study, we propose to apply the multiple imputation method that is one of the well established methods for incomplete data analyses. The multiple imputation method is now available in many standard software, and is familiar with practitioners in epidemiologic studies. In addition, the multiple imputation method uses all the data available and approximates the fully efficient maximum likelihood estimator. Simulation studies demonstrated that the multiple imputation estimators had greater precisions than the many existing estimators in realistic settings. An illustration with data taken from Wilms’ tumor studies is provided.