Classification of external defects on soybean seeds using multi-input convolutional neural networks with color and UV-induced fluorescence images input

Yoshito SAITO; Riku MIYAKAWA; Takumi MURAI; Kenta ITAKURA

doi:10.11532/jsceiiai.5.1_135

Abstract

Since accurate soybean seed sorting is a crucial but time-consuming and labor-intensive process in soybean production, there is a need for an inexpensive and simple sorting method. The objective of this study was to classify soybean external defects by multi-input convolutional neural network (CNN) models with two types of images: color and UV-induced fluorescence images. Color and fluorescent images of soybean seeds were respectively taken by white and UV LED with a wavelength of 365 nm, and visually labeled into four categories: normal, wrinkled, peeled and defect. For classification, the multi-input CNN models were constructed using three patterns of pre-trained networks: AlexNet, ResNet-18 and EfficientNet. The classification accuracy of each model was evaluated with the test data which consists of 20% of the total data. As a result, the multi-input CNN models showed generally higher classification accuracy than single color images or fluorescence images input models. Furthermore, the highest classification accuracy was 93.9% with the multi-input CNN models using ResNet-18, where the accuracy was higher than single color images or fluorescence images input by over 6.0 pt. These results demonstrated that a multi-input CNN model combining conventional color images with fluorescence images has a potential for soybean external defect classification.

1. INTRODUCTION

Soybeans (Glycine max (L.) Merr.), one of the most important world's agricultural crops, has been attracting a wide attention with the world population growth. Soybeans are a typical field crop, and to ensure both yield and global environmental conservation, it is necessary to reduce losses during the production stage due to disease and pests as much as possible.

One of the factors hindering the stable supply of soybeans is loss at the pre-harvest stage due to weeds, diseases and insects. It has been reported that, about 37% of the possible production of soybeans is lost in the field during vegetation due to competition with weeds¹⁾. In recent years, from the viewpoint of the environmental load by agricultural production, the narrow row culture of soybean has been proposed²⁾. In this method, it is necessary to cover weeds completely without gaps. Therefore, accurate sorting method which eliminates soybean seeds with mechanical damage, mold, virus, and other chemical defects, is crucial before vegetation.

Soybean seed sorting is essential because manual sorting is extremely laborious and time-consuming. Currently, two types of sorting are commonly used for soybean seeds: mechanical sorting based on the shape and size of the seed, and color sorting based on the color information of the seed coat. In mechanical sorting, foreign matters can be removed by sieving with a certain size and broken and/or defected grains which do not roll on a conveyor belt can be eliminated as well. The grains are finally sorted into large, medium, and small sizes by passing through sieves of different sizes. On the other hand, color sorting is used to sort defected grains by virus, disease and insect, which cannot be sorted by mechanical sorting. These defects are characterized by color information such as lesions or texture features appearing on the seed coat. Contamination of virus or diseased soybeans must be thoroughly prevented especially for soybean culture. Currently, color sorters are large and expensive, and have only been used in a few facilities in each farming region. Farmers often transport their soybeans to a facility where a color sorter is installed, sort them, and bring them back to their farms. Therefore, there is a need to develop a compact and inexpensive seed soybean sorter that can be introduced at each farmhouse.

In recent years, dramatic advances in spectroscopy, image processing technology, machine learning, and deep learning have enabled automatic sorting technology of agricultural products³⁾. For soybeans, external defects classification has been conducted by color images coupled with machine learning⁴⁾ and deep learning⁵⁾. One of the other previous studies on soybean defect classification used transmitted images with background illumination⁶⁾. Furthermore, not only visible color images but also near infrared (NIR) multispectral images has been used to detect the stone beans⁷⁾.

In addition to NIR spectroscopy or imaging technologies, fluorescence induced by ultraviolet excitation has been intensively investigated as a simple and sensitive sensing method. Fluorescence can detect the presence of fluorescent substances which is not visible in color images, and minute defects and scratches on the surface of agricultural crops. For example, Li et al. (2019) ⁸⁾ reported the prediction of soybean seedling germination rate by hyperspectral fluorescence imaging with UV excitation⁸⁾. Autofluorescence spectral imaging has also been used to assess soybean seed quality based on various chemical attributes⁹⁾. Hyperspectral imaging is rich in spectroscopic characteristics because the images are captured at fine wavelength intervals of 10-20 nm, but it is not suitable for implementation for small farmers because the camera is expensive and the amount of data is huge. On the other hand, it has been reported that UV-excited fluorescence images can be analyzed as visible RGB color images to identify the defects on the surface of citrus fruits, to predict freshness of strawberries and fishes^10)-13). Thus, it is expected that the RGB image of UV-excited fluorescence, in addition to the ordinary color image, can be used to discriminate defect types with lower cost and computation load compared to hyperspectral images. However, there have been no reports on the classification of soybean external defects using UV-excited fluorescence images.

To classify the external defects of soybeans based on both color and fluorescence images, utilizing deep learning models is premising. One of the previous studies established multi-input convolutional neural network (CNN) models to classify tree species using images of tree trunks and leaves¹⁴⁾. In this study, feature extraction was performed on the images of both tree trunks and leaves using a pre-trained CNN, and the extracted features were used for training with support vector machines. The classification accuracy improved compared to the network trained with only tree trunks or leaves, which showed the potential of using two types of images for CNN models.

Therefore, the objective in in this study is to classify the soybean external defects using both color and fluorescence images coupled with the multi-input CNN models. First, harvested Japanese soybeans were manually sorted based on human inspection. Second, color and fluorescence images of soybeans were captured using an imaging system which consists of color camera, white LED and UV LED with the wavelength of 365 nm. Then, new multi- input CNN classification models were constructed using pre-trained networks, and then the classification accuracy was investigated.

2. MATERIALS AND METHODS

(1) Soybean samples

In this study, Japanese soybeans (cultivar: 'Toyokomachi'), harvested in October 2022 at the Field Science Education and Research Center (Muramatsu Station) in Niigata University, were provided. After harvesting, foreign matters such as pods, branches, and dust were removed, and the seeds were dried at room temperature to a moisture content of approximately 15%. The weight of the seeds was about 800 g.

After harvest, soybeans were visually sorted by human eyes into four categories: normal, wrinkle, peeled, and defect. The category of defect includes disease, mold, and insect damage.

(2) Color and fluorescence imaging

The color and fluorescence images of soybeans were taken by the imaging system shown in Fig. 1. The schematic figures of the imaging system from the front and side were shown in Fig. 1(a) and Fig. 1(b), respectively.

The white LEDs (LDL2-80X16SW2, CCS) were used to capture color images, and the UV LEDs (LDL-71X12UV2-365-N, CCS) with a wavelength of 365 nm were used to capture fluorescence images. A polarizing filter was placed in front of each white LED and camera lens to remove specular reflection from the soybean surface. A UV-cut filter with a cutoff wavelength of 390 nm was attached to the camera lens to prevent the detection of UV reflected light in the fluorescent images. A color camera (EOS kiss x7, Canon) and macro lens (DG MACRO, SIGMA) with a focal length of 70 mm were used to achieve high image resolution.

For one image, a total of 24 soybeans (6 vertical and 4 horizontal) were uniformly arranged within the LED illumination area. The F number and ISO sensitivity were set to 5.6 and 800, respectively, and the shutter speed was 1/30 second for the color image and 1/20 second for the fluorescence image. The original image size was 5184 × 3456 pixels.

(3) Image preprocessing

The representative original color and fluorescence images were respectively shown in Fig. 2(a) and Fig. 2(b), and the flowchart of image preprocessing is shown in Fig. 2(c). MATLAB (MATLAB R2023a, MathWorks, USA) and a laptop (LEVEL-15FR170-i7-TARX, iiyama, Japan) were used for the analysis in this study. The CPU used was the 11th Gen Intel(r) Core(tm) i7-11800H (Intel, USA), and the GPU used was the GeForce RTX 3070 (NVIDIA, USA).

After capturing two types of images shown as Fig. 2(a) and Fig. 2(b), the threshold image was obtained by the color image using Otsu’s binarization¹⁵⁾. After removing background noise other than soybeans, soybean regions were identified by filling holes and concatenating the regions. The bounding boxes were obtained for each grain, and the image was cropped for each soybean. The cropped soybean images were labeled into four categories (normal, wrinkled, peeled and diseased), and saved so that the color and fluorescence images were corresponded to the same soybean sample.

(4) Classification using multi-input convolutional neural networks

In this study, pretrained networks including AlexNet¹⁶⁾, ResNet-18¹⁷⁾, and EfficientNet¹⁸⁾ were utilized as backbones to construct CNNs for classification. The overview of the network is shown in Fig. 3.

As shown in Fig. 3, the two types of images, color image and fluorescence image, were inputted. The input images were convoluted using the aforementioned three backbone networks. The obtained features in the series of convolution were concatenated, and then classification was performed using fully connected layers and softmax layers. The concatenation method of these features was referenced from a previous study¹⁹⁾. The initial weights and biases of the backbone were set using pretrained values. The initial weights of the fully connected layer after concatenation of the two inputs were initialized using Glorot (also known as Xavier) initialization, and the biases were initialized to 0.

The configuration of these networks was performed using DeepNetworkDesigner in MATLAB. The networks created within this application were exported and saved for training and inference in this study. The Deep Learning Toolbox in MATLAB was mainly used for the training of these networks.

As shown in Fig. 2(c), the input images were resized to allow images to fit to each network, and the available images were randomly divided into training, validation, and testing sets in a ratio of 7:1:2. The identical training, validation, and testing data were used for each model.

The evaluation of the constructed deep learning model’s accuracy was performed on the test data. A confusion matrix was created based on the true labels and predicted labels to assess the classification accuracy. The hyperparameters used for training were explored and optimized through grid search. Three patterns of initial learning rates, namely 1.0×10^-4, 5.0×10^-5 and 1.0×10^-5, were employed, along with mini-batch sizes of 8 and 16, and epoch numbers of 5 and 10. The number of combinations of the hyperparameters was 12. Among these combinations, training was conducted, and the model with the highest accuracy on the validation data was selected. This selected model was then used for inference on the test data to obtain the classification results. For the optimizer, Adam was utilized.

3. RESULTS AND DISCUSSION

(1) Classification result using multi-input network

The classification results obtained from all models were shown in Table 1. "Color image only" and "Fluorescence image only" represent the results obtained when performing classification using color images and fluorescence images alone, respectively. The results of multi-input CNN models are indicated as "Both color and fluorescence images." The test accuracy was written in the table when the validation accuracy is highest among the combinations of training parameters.

Among these results, the method utilizing both RGB images and fluorescence images achieved the highest test accuracy with 93.9% when using ResNet-18. Furthermore, the classification accuracy obtained in this study showed improvement in test accuracy compared to using RGB images only or fluorescence images only, regardless of the network used. Particularly, when utilizing ResNet-18, the accuracy reached 93.9%, representing a 6.5 pt improvement compared to using the same network with color images alone. Thus, training a deep learning network with both RGB and fluorescence images as inputs was found to contribute to accuracy improvement.

Fig. 4 shows the confusion matrix obtained from the classification of soybeans using each model. The vertical and horizontal label shows actual and predicted class, respectively. As shown in Fig. 4, classification accuracy for "Defect" was generally high among all the models. As an example, typical images of "Defect" and "Normal" labels were shown in Fig. 5. As shown in Fig. 5, samples of "Normal" label showed bluish-white fluorescence by excitation at a wavelength of 365 nm, whereas those of the "Defect" label showed black texture with little fluorescence emission. The bluish-white fluorescence color in the "Normal" label is thought to be derived from oxidation products which can be observed in the soybean surface when excited by a wavelength of 375 nm²⁰⁾. The absence of fluorescence in the defected area as shown in Fig. 5(a) might be due to physiological damage caused by a virus, which inhibits the production of fluorescent substances inherent to soybeans.

On the other hand, misclassification was frequently observed between "Normal" and "Wrinkle" labels, and between "Peeled" and "Wrinkle" labels. The misclassified images were discussed in the next section.

(2) Examples of misclassified images

Fig. 6 shows the examples of misclassified images. In Fig. 6(a), the images labeled as "Wrinkle" and classified into "Normal" were shown, whereas those labeled as "Wrinkle" and classified into "Peeled" were shown in Fig. 6(b). As shown in the confusion matrix shown in Fig. 4, the misclassification patterns were frequently observed.

The error patterns shown in Fig. 6(a) occurred probably because the wrinkle was observed around the edges of soybean and not enhanced in the fluorescence images, which resulted in difficulty for classification. On the other hand, the patterns shown in Fig. 6(b) were observed because the other types of external defects than "Wrinkle" such as "Peeled" were simultaneously included on the soybean surface.

We confirmed that accurate classification was difficult when more than two types of external defects coexist in a single soybean, or targeted defects were not highlighted in the fluorescence images. Furthermore, one disadvantage of this method is the network size and computation time for training because it receives two inputs. Compared to a single input such as only the color images only or fluorescence images, the size of the network is about twice large, so it is desirable to make the network lighter. Sarma et al. (2022) ²¹⁾ use two inputs, RGB information and a motion template guided by optical flow (OFMT), of a video of a gesture made with a hand to classify the gesture. Here, a video was used instead of the still images, which include a rich information from various angles. In our study, since each soybean image was captured from only top side, classification of soybean external defects would be more accurate if each soybean sample was captured from different angles such as from bottom. Therefore, as a future study, it is possible to take video of soybeans on the grading machine and classify them based on the video, which includes various scenes.

4. CONCLUSIONS

In this study, new multi-input convolutional neural network (CNN) models with two types of images inputs: color images and fluorescence images, were constructed for classifying external defects on soybeans. By using the imaging system, fluorescence images of soybeans were taken at an excitation wavelength of 365 nm, which showed that bluish-white fluorescence was observed on the soybeans with "Normal" label, while fluorescence was little and observed as a black pattern in the "Defect" labels. A multi-input CNN models were constructed using three types of pre-trained networks: AlexNet, ResNet-18, and EfficientNet as the backbone, and compared with models using only the color image and only the fluorescence image as input. The results showed that the multi-input CNN models generally showed higher classification accuracy than single image input. The highest accuracy for the test data was 93.9% obtained by ResNet-18, which was more than 6.0 pt more accurate than the single input models. These results indicate that deep learning with two inputs, a color image and a fluorescence image, has the potential for classification of external defects on soybeans.

ACKNOWLEDGMENT

This work was supported by JSPS KAKENHI Grant Numbers JP22K20600 and JP23K14044, and the Mazda Foundation.

References

1) Oerke, E.-C. : Crop losses to pests. The Journal of Agricultural Science, Vol. 144, No. 1, pp. 31-43. 2006.
2) Wells, M. S., Reberg-Horton, S. C., Mirsky, S. B. : Cultural Strategies for Managing Weeds and Soil Moisture in Cover Crop Based No-Till Soybean Production. Weed Science, Vol. 62, No. 3, pp. 501-511, 2014.
3) Bhargava, A., Bansal, A. : Fruits and vegetables quality evaluation using computer vision. A review. Journal of King Saud University-Computer and Information Sciences, Vol. 33, No. 3, pp. 243-257, 2018.
4) de Medeiros, A. D., Capobiango, N. P., da Silva, J. M., da Silva, L. J., da Silva, C. B. and dos Santos Dias, D. C. F. : Interactive machine learning for soybean seed and seedling quality classification. Scientific Reports, Vol. 10, pp. 11267, 2020.
5) Zhao, G., Quan, L., Li, H., Feng, H., Li, S., Zhang, S., Liu, R. : Real-time recognition system of soybean seed full- surface defects based on deep learning. Computers and Electronics in Agriculture, Vol. 187, pp. 106230, 2021.
6) Momin, M. A., Yamamoto, K., Miyamoto, M., Kondo, N., Grift, T. : Machine vision based soybean quality evaluation. Computers and Electronics in Agriculture, Vol. 140, pp. 452-460, 2017.
7) Hu, X., Yang, L., Zhang, Z. : Non-destructive identification of single hard seed via multispectral imaging analysis in six legume species. Plant Methods, Vol. 16, pp. 116, 2020.
8) Li, Y., Sun, J., Wu, X., Chen, Q., Lu, B., Dai, C. : Detection of viability of soybean seed based on fluorescence hyperspectral and CARS-SVM-AdaBoost model. Journal of Food Processing and Preservation, Vol. 43, No. 12, pp. e14238, 2019.
9) Barboza da Silva, C., Oliveira, N. M., de Carvalho, M. E. A., de Medeiros, A. D., de Lima Nogueira, M., dos Reis, A. R. : Autofluorescence-spectral imaging as an innovative method for rapid, non-destructive and reliable assessing of soybean seed quality. Scientific Reports, Vol. 11, pp. 17834, 2021.
10) Abamba Omwange, K., Saito, Y., Firmanda Al Riza, D., Zichen, H., Kuramoto, M., Shiraga, K., Ogawa, Y., Kondo, N., Suzuki, T. : Japanese dace (Tribolodon hakonensis) fish freshness estimation using front-face fluorescence spectroscopy coupled with chemometric analysis. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, Vol. 276, pp. 121209, 2022.
11) Huang, Z., Omwange, K. A., Tsay, L. W. J., Saito, Y., Maai, E., Yamazaki, A., Nakano, R., Nakazaki, T., Kuramoto, M., Suzuki, T., Ogawa, Y., Kondo, N. : UV excited fluorescence image-based non-destructive method for early detection of strawberry (Fragaria × ananassa) spoilage. Food Chemistry, Vol. 368, pp. 130776, 2021.
12) Huang, Z., Omwange, K. A., Saito, Y., Kuramoto, M., Kondo, N. : Monitoring strawberry (Fragaria × ananassa) quality changes during storage using UV-excited fluorescence imaging. Journal of Food Engineering, Vol. 353, pp. 111553, 2023.
13) Momin, Md. A., Kondo, N., Ogawa, Y., Ido, K., Ninomiya, K. : Patterns of Fluorescence Associated with Citrus Peel Defects. Engineering in Agriculture, Environment and Food, Vol. 6, No. 2, pp. 54-60, 2013.
14) Itakura, K., Hata, T., Hosoi, F. : Tree Species Classification Using Leaf and Tree Trunk Images, Proc. of the IGARSS 2020 - 2020 IEEE International Geoscience and Remote Sensing Symposium. Presented at the IGARSS 2020, pp. 4339-4342, 2020.
15) Otsu, N. : A threshold selection method from gray-level histograms. IEEE transactions on systems, man, and cybernetics, Vol. 9, No.1, pp. 62-66, 1979.
16) Krizhevsky, A., Sutskever, I., Hinton, G. E. : Imagenet classification with deep convolutional neural networks. Communications of the ACM, Vol. 60, No. 6, pp. 84-90, 2017.
17) He, K., Zhang, X., Ren, S., Sun, J. : Deep residual learning for image recognition, Proc. of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 770-778, 2016.
18) Tan, M., Le, Q. : EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. Proc. of the 36th International Conference on Machine Learning, PMLR, pp. 6105-6114, 2019.
19) Ke, R., Li, W., Cui, Z., Wang, Y. : Two-Stream Multi-Channel Convolutional Neural Network (TM-CNN) for Multi-Lane Traffic Speed Prediction Considering Traffic Volume Impact. Transportation research board, Vol.2647, No. 4, 2019.
20) Dan K., Yamato Y., Imada S., Sugie M. : Estimation of Aging of Soybean Seeds by Measurement of Delayed Fluorescence after Irradiating with UV-excited Light. Journal of Science and High Technology in Agriculture, Vol. 26, No. 3, pp. 154-159, 2014.
21) Sarma, D., Kavyasree, V., Bhuyan, M. K. : Two-stream fusion model using 3D-CNN and 2D-CNN via video-frames and optical flow motion templates for hand gesture recognition. Innovations in Systems and Software Engineering, 2022.

Corresponding author

Register with J-STAGE for free!