BACKGROUND
Large electronic databases have been widely used in recent years; however, they can be susceptible to bias due to incomplete information. To address this, validation studies have been conducted to assess the accuracy of disease diagnoses defined in databases. However, such studies may be constrained by potential misclassification in references and the interdependence between diagnoses from the same data source.
METHODS
This study employs latent class modeling with Bayesian inference to estimate the sensitivity, specificity, and positive/negative predictive values of different diagnostic definitions. Four models are defined with/without assumptions of the gold standard and conditional independence, and then compared with breast cancer study data as a motivating example. Additionally, simulations that generated data under various true values are used to compare the performance of each model with bias, Pearson-type goodness-of-fit statistics, and widely applicable information criterion.
RESULTS
The model assuming conditional dependence and non-gold standard references exhibited the best predictive performance among the four models in the motivating example data analysis. The disease prevalence was slightly higher than that in previous findings, and the sensitivities were significantly lower than those of the other models. Additionally, bias evaluation showed that the Bayesian models with more assumptions and the frequentist model performed better under the true value conditions. The Bayesian model with fewer assumptions performed well in terms of goodness of fit and widely applicable information criteria.
CONCLUSIONS
The current assessments of outcome validation can introduce bias. The proposed approach can be adopted broadly as a valuable method for validation studies.
抄録全体を表示