1997 Volume 6 Issue 2 Pages 153-159
For quality control in screening mammography, we have investigated the mammography classification performance of medical doctors in an image-reading study. In order to evaluate their performance in distinguishing breast cancer from benign tumor or normal breast by mammography, 25 medical doctors with varying levels of experience reported their breast cancer judgment by rating each mammogram. They interpreted the mammograms of 49 patients with breast cancer and 251 patients with benign tumor or normal breast. We conducted this image-reading study twice, with an interval of 3 months. From receiver operating characteristic (ROC) analysis and Brier score analysis, there was a statistically significant difference between each doctor's performance ; and the performance of medical doctors with a lower number of mammograms interpreted was significantly worse than that with a larger number of mammograms interpreted (p<0.05).
Therefore, this difference in mammogram interpretation among medical doctors was estimated to be due mainly to experience in reading mammograms. Furthermore, we investigated the interobserver agreement in judging the findings of mammography by kappa analysis ; the findings tested were calcification, circumscribed lesion, stellate lesion, and architectural distortion. The interobserver agreement for calcification was significantly highest among these mammography findings (p<0.05), and the interobserver agreement for architechtural distortion was very low.
From the above results, we confirmed that there was variability in the diagnostic accuracy of mammography among medical doctors, and that improvement of their performance was needed in order to improve the efficiency of screening mammography.