This study examines labor-saving methods for creating training data in the construction of a deep learning model for fruit detection. The study examined the relationship between the number of training images required for normal training and detection accuracy for peaches and grapes. The results of the normal training were used as a reference to examine the other methods, a method based on training between different fruit species, ones using fruit model images, and the effect on accuracy of the presence or absence of background was also compared. In the normal training, accuracy was achieved with around 200 images, and the effect of further increases was limited. Furthermore, in the learning of interspecific fruits, it was demonstrated that even learning model trained on different species can achieve high detection accuracy if it can utilize image data of a small number of target fruit species. The combination of learning with fruit replica images and fine tuning with a small amount of real images was shown to achieve high accuracy while reducing the dependence on the real environment. These results contribute to labor-saving fruit detection using deep learning.