2025 Volume 6 Issue 2 Pages 13-22
To discriminate between male-sterile and male-fertile Cryptomeria japonica (C. japonica) accurately, this study aimed to construct a discrimination model for identifying between male-fertility and male- sterility in C. japonica using fluorescence spectroscopy. The male strobili of the male-fertile and male- sterile C. japonica were divided into halves, and the internal fluorescence properties were measured by the excitation-emission matrix (EEM). Totally 10 data sets were derived from the EEM with different preprocessing methods, each of which was subjected to principal component analysis to construct a classification model based on the support vector machine (SVM). The data set with the highest F1 score (a harmonic mean of precision and recall) was the second derivative synchronous fluorescence spectra with Δλ = 80 nm and a window size of 11 nm, which shown a score of 98.5%. These spectra were deemed to be of high accuracy, as it was able to capture the fluorescence peaks that were specific to the male-fertile and male-sterile strobili. Additional data splitting analysis indicated that the kernel function k=1 in SVM was the optimum, resulting in a 100% precision to the test data. The results suggested the potential for utilizing fluorescence to distinguish between male-sterility and male-fertility in C. japonica.
In Japan, Japanese cedar (Cryptomeria japonica D. Don) has been planted to meet postwar demand for lumber. According to the data provided by the Forestry Agency, 4.44 million hectares of C. japonica had been planted in 2017. Although planted C. japonica provide numerous benefits to the public, including the conservation of water resources, the prevalence of cedar pollen-induced C. japonica pollinosis is on the rise. The prevalence of C. japonica pollinosis is reported to be 38.8% in 20191). C. japonica pollinosis has been linked to a range of health issues affecting the nasal mucosa, such as sneezing, runny nose, and nasal congestion. These symptoms can have a significant impact on an individual’s working productivity2).
One strategy for mitigating the effects of C. japonica pollinosis is to expand male-sterile C. japonica, which is devoid of pollen scattering. The male-sterile C. japonica was first identified in 19923)and is defined by the production of male strobili without the subsequent shedding of pollen. This property has been demonstrated to be regulated by a single male-sterility gene4). Five distinct male sterility genes: MS1, MS2, MS3, MS4 and MS5, have been identified through microscopic observation of the male strobili and test crossings 4-8). Each gene exhibits a unique expression pattern during C. japonica pollen maturation. Of these genes, MS1 is the most widely utilized in breeding and nursery production due to its extensive population. The production of male-sterile seedlings is achieved using a male-sterile C. japonica as the maternal tree and a C. japonica heterozygous for the male-sterility gene as the pollen parent. Given that approximately half of the seedlings will be male-sterile C. japonica whereas the remaining half will be male-fertile C. japonica, it is necessary to implement a technology capable of distinguishing between the two.
At the seedling production site, a technique is currently employed whereby male strobili are collected, crushed in a bag, and examined for the presence of pollen using a stereomicroscope9). Although this method has an extremely high level of accuracy, at approximately 98%, it requires breif training for operators and implementation by a group of several operators. Moreover, misjudgments are more prevalent during inclement weather conditions. Recently, a DNA-based discrimination method has been developed10). Although this method is highly effective due to its reliance on genetic data, it is expensive and need the use of complicated analytical tools. Therefore, there is a need for a discrimination method that is more rapid and straightforward than conventional methods.
A rapid and simple method for discriminating male-sterility from male-fertility has been reported, which employs near-infrared (NIR) spectroscopy11). This method enables the non-destructive identification of male-fertility and male-sterility in male strobili based on the amount of light absorption, and an accuracy of approximately 85% achieved for an independent data set. However, this is still below the accuracy threshold of 95% or higher required for practical applications, and further improvements are necessary for highly accurate discrimination.
Recently, fluorescence spectroscopy has gained considerable attention in agricultural applications12), offering a spectroscopic approach that is more sensitive to trace components than NIR spectroscopy. Fluorescence is defined as the emission of light following the absorption of ultraviolet or visible light by a fluorophore13). Fluorescence spectroscopy is a prevalent technique in agricultural research, with numerous studies investigating its applications in seed sorting14-15). In the field of forestry, chlorophyll fluorescence of leaves has been employed to assess the condition of oak forests16). Additionally, in previous study, fluorescence spectroscopy has been demonstrated to investigate the fluorescence response in C. japonica pollen17). However, fluorescence spectroscopy has yet to be applied to discriminate male-sterility in C. japonica male strobili. It is anticipated that the utilization of pollen fluorescence will facilitate the development of an accurate technique for the discrimination of male - fertility and male-sterility in C. japonica male strobili.
The objective of this study was to discriminate the male-sterility and male-fertility in C. japonica using fluorescence spectroscopy. First, the internal fluorescence characteristics of C. japonica male strobili sampled in December 2023, January 2024, and February 2024 were characterized using a spectrofluorometer (FP-8350, JASCO), thereby measuring excitation emission matrix (EEM). Subsequently, the EEM data were employed to develop a classification model for discrimination between male-fertile and male-sterile strobili. This entailed the utilization of a comprehensive set of 10 EEM data, encompassing the full wavelength range of the EEM, the EEM within the pollen peak region, three types of synchronous fluorescence spectra and their second derivative spectra, along with the excitation spectra at the emission wavelength of 685 nm and its second derivative spectra. In accordance with previous study11), the combined model with principal component analysis (PCA) and a support vector machine (SVM): PCA-SVM was employed to construct the classification model. By comparing the accuracy of the models, the optimal combination of EEM data, sampling month, and principal components for discrimination was selected. Furthermore, the accuracy of the models constructed using the training data were evaluated on the test data to validate the model’s accuracy for unknown data.
(1) Materials
Table 1 illustrates the samples of male strobili in C. japonica provided in this study. All samples were collected in Niigata Prefecture, Japan, and were labeled by population names A, B, C, D, G and I, according to the sampling location. Groups A, G and I are not related within a population and are independent individuals. Groups B, C and D were obtained by artificial crossing; Group B consists of various individuals with the same maternal and paternal grandmothers. The trees of Groups C and D are families, and both populations have the same father. The samples were taken once a month for three months, from December 2023 to February 2024. However, Group G was sampled only in December 2023 due to the lack of male flowering. Groups E and F were not utilized in the study due to a high mortality rate. Group H was also skipped because an enough number of samples could not be provided.
(2) Excitation emission matrix (EEM) measurement
To characterize the internal fluorescence properties of fertile and sterile male strobili, the EEM was measured with a spectrofluorometer (FP-8350, JASCO). The EEM is a three-dimensional data set comprising excitation wavelength, emission wavelength and fluorescence intensity. The EEM provides a comprehensive fluorescence characteristics of a given sample when irradiated with excitation within the ultraviolet to visible wavelength range. Fig. 1(a) and 1(b) illustrate a male strobili sample and the cross section cut in half for EEM, respectively. The samples were positioned in the center of the cell holder, as illustrated in Fig. 1(c), and the EEM was measured. Three male strobili were selected for repeated measurements on each sample. EEMs were measured using the front-face method, as illustrated in Fig. 1(d). The wavelength region of excitation and emission were 280-700 nm, 290-720 nm, respectively. Both excitation and emission wavelengths were measured with a bandwidth of 5 nm, and the sensitivity of the instrument was set to low to avoid the intensity saturation.
To normalize the daily variation in the intensities due to the detector, fluorescence intensity was standardized based on the methodology reported in a previous study18). The standardization was based on the peak area of Raman scattering observed in emission wavelength at 371-428 nm when distilled water was excited at 350 nm. The fluorescence intensity obtained in arbitrary units was divided by this peak area, and the resulting value was used as the Raman unit (R.U.) for subsequent analysis.
(3) Classification model
a) All input analysis
In this study, PCA was employed to reduce the dimensionality of the EEM spectral data. The principal component (PC)1, PC2 and PC3 were identified through PCA and subsequently employed to develop a classification model using the SVM. The combination patterns of the principal components were PC1 and PC2, PC1 and PC3, PC2 and PC3, and all three components: PC1, PC2 and PC3. The initial step involved an all-input analysis, whereby all data were utilized for both training and test to identify the most suitable spectral data to be employed as input.
The all input analysis was conducted as illustrated in Fig. 2. Following EEM data acquisition, the emission wavelengths within a range of +20 nm of the excitation wavelengths due to elastic scattering were removed. Totally 10 different patterns of the EEM data were obtained: (1) the full wavelength range of the EEM, (2) the EEM peak region of the pollen, (3) three types of synchronous fluorescence spectra with Δλ = (3) 70, (4) 80, and (5) 90 nm spectra, second derivative of the synchronous fluorescence spectra with Δλ = (7) 70, (8) 80, (8) 90 nm spectra, (9) the excitation spectra at the emission wavelength of 685 nm, and (10) the second derivative spectra of the excitation spectra at emission wavelength of 685 nm. In this study, the second derivative calculation was employed to eliminate the baseline effect from the spectra. The symbol Δλ represents the wavelength difference between the emission wavelength and the excitation wavelength. The rationale behind the selection of the three patterns of Δλ in the synchronous fluorescence spectra was to capture the dominant fluorescence peaks observed in the EEM of male-fertility and male-sterility samples. Furthermore, six window sizes: 3, 5, 7, 9, 11 and 13 nm, were applied for the second derivative of the synchronous fluorescence (SDSF) spectra. In this case, the second derivatives (6), (7), (8) and (10) were calculated using the Savitzky-Golay method19). The window size represents the range over which the spectra are weighted averaged. Generally, larger window sizes result in smoother spectra. Additionally, the excitation spectra at 685 nm of the emission wavelength were selected to extract the peak region of chlorophyll observed in the EEM.
PCA was conducted on each of the input data sets to obtain PC1, PC2 and PC3 to construct SVM classification models. Furthermore, three types of SVM kernel functions were adopted: the first-order kernel (k = 1), the second-order kernel (k = 2), and the third-order kernel (k = 3).
To evaluate the accuracy of the models, F1 score was employed. The F1 score is represented by Equation (1) and is a measure of a model’s performance, whereby a value of 1 indicates optimal performance with a good balance of precision and recall20). In this study, the precision rate is defined as the proportion of predicted male-sterility that was actual male-sterility to predicted male-sterility. On the other hand, recall is defined as the proportion of actual male-sterility that was predicted to be male- sterility to actual male-sterility.
b) Data splitting analysis
After determining the optimal parameter by all input analysis, the data splitting analysis was done to verify the accuracy of the model for unknown data and to identify the optimal number of dimensions of the kernel function in the SVM. As illustrated in Fig. 3, the dimensionality reduction was applied on the training data through PCA and classification models were created by SVM. The principal component coefficients were utilized to calculate the principal components of the test data, which were subsequently classified using an SVM constructed from the training data set.
The evaluation of the accuracy in the test data was based on the precision of male-sterility since it is the most essential factor in the seedling shipment. The training and test combinations for each set are illustrated in Fig. 3. As illustrated in Table 1, the identical paternal lineage of Groups C and D suggests the existence of a blood relationship between the two groups. Consequently, Groups C and D were used as a single group. For Set 1, the training and test sets consisted of Groups A, B, and I, and Groups C and D, respectively. For Set 2, the training and test sets consisted of Groups C, D, and I, and Groups A and B, respectively.
(1) EEM results
The results of EEM measurements in male-fertile and male-sterile C. japonica male strobili are shown in Fig. 4(a) and (b), respectively. The vertical axis, the horizontal axis and color represent the excitation wavelength, the emission wavelength and the fluorescence intensity, respectively. As the emission wavelength is returned at a longer wavelength than the excitation wavelength, data is only found in the lower region of the line drawn from the upper right to the lower left in Fig. 4. Consequently, the data in the region that is black indicates an absence of data. In Fig. 4(a), three peaks: Peak A, B and C, were observed in male-fertile male strobili while three peaks, Peak A, C and D, were detected in male-sterile male strobili. Peak A is located at the excitation and emission wavelength (Ex./Em.) of 290-350 nm/ 410- 480 nm, and considered as a type of cinnamic acid21), and Peak B is observed at Ex./Em. of 420-520 nm/ 480-600 nm for flavonoid compounds22). Peak C and Peak D were found at Ex./Em. of 300-675 nm/685 nm and 490-570 nm/ 550-620 nm, in which the possible fluorophores are chlorophyll23), and lignin24), respectively. In Peak C, which was common to both male-fertile and male-sterile strobili, the male-sterile strobili have stronger fluorescence intensities at shorter excitation wavelengths compared to the male- fertile strobili. This may suggest that there is a difference in the amount of chlorophyll between the two. To investigate the effect of fluorescence of pollen itself, the EEM of pollen alone were measured, which was presented in Fig. 4(c). According to the EEM result shown in Fig. 4(c), Peak B is deemed to be derived from the autofluorescence from the pollen.
(2) Classification accuracy
a) All input analysis
A total of 10 distinct input data sets were constructed using the EEM data. Table 2 illustrates the combination of the input data and preprocessing parameters which showed the highest F1 score for each input data set derived from the model creation process depicted in Fig. 2. All selected patterns were sampling months of January or February 2024, which means December 2023 data was not selected. This may be attributed to the exposure of male strobili to low temperatures during the dormant period, which has been observed to gradually weaken dormancy and enhance developmental capacity25). In other words, dormancy begins in early November26) and the difference between male-fertile and male-sterile strobili may have become more evident as pollen matures. And the male strobili development mechanism may become increasingly active as the days of low temperatures pass from December to February. This might be one of the reasons why the December data set showed lower F1 scores than the other two months. In all cases, the combination of principal components utilized was PC1, PC2 and PC3. This may be because an increase in the number of principal components results in the increase of explained variance of the data, thereby elevating the accuracy of the model.
In Table 2, the highest F1 score was 98.5%, which was attained for the SDSF spectra with Δλ of 80 nm and a window size of 11 nm harvested in January 2024. This data was selected because the synchronous fluorescence spectra captured the main fluorescence components, such as Peak B and Peak D, that were obviously different between male-fertile and male-sterile strobili. In addition, the second derivative removed the baseline noise such as light scattering, allowing selective enhancement of the peak differences.
b) Data splitting analysis
The results of the data splitting analysis conducted on SDSF spectra with Δλ of 80 nm and a window size of 11 nm harvested in January 2024 are presented in Table 3. The training and test sets comprised Groups A, B, and I, and Groups C and D for Set 1, and Groups C, D, and I, and Groups A and B for Set 2, respectively. As shown in Table 3, the model generated in Set 1 exhibited higher accuracies than in Set 2, with a male-sterility precision of 100% for all kernel functions in the test data set. The accuracy, male-sterility recall and F1 score with the kernel function of k = 1 and 3 were identical, while those indicators were lower in k = 2. In this case, k = 1 was identified as the optimal decision boundary, given that the reduced dimension of the kernel function can be regarded as more versatile. In contrast, Set 2 yielded a male-sterility precision of approximately 90% for the test data, which may be unsuitable for practical applications than Set 1. This may be attributed to the kinship between Group C and D, which resulted in the model overfitting the training data. The synchronous fluorescence spectra with Δλ of 80 nm and a window size of 11 nm sampled in January 2024 using training data of Set 1, Groups A, B and I, were shown in Fig. 5(a), and the enlarged spectra of the emission wavelength of 385-685 nm were shown in Fig. 5(b). The vertical axis represents the fluorescence intensity, and the horizontal axis depicts the emission wavelength. The vertical lines, equally spaced in Fig. 5(a) through 5(d), indicate the standard deviation of the spectra.In Fig. 5(a), Peak C which may represents chlorophyll peak was remarkable. On the other hand, other fluorescence peak can be seen in Fig. 5(b). Peak A is common in both male-fertility and male-sterility,while Peak B and D are the specific fluorescence substances in male-fertility and male-sterility, respectively. Peak B, which might be derived from flavonoid compounds, was observed around the emission wavelength of 530 nm, and Peak D, which might be due to lignin, was observed around the emission wavelength of 600 nm. On the other hand, The SDSF spectra with Δλ of 80 nm and a window size of 11 nm sampled in January 2024 using training data of Set 1, Groups A, B and I, were shown in Fig. 5(c), and the enlarged spectra of the emission wavelength of 385- 650 nm were shown in Fig. 5(d). As shown in Fig. 5(c), the SDSF spectra of male-sterile strobili showed a slightly larger peak than those of male-fertile strobili around the emission wavelength of 685 nm, which is corresponding to Peak C. As illustrated in Fig. 5(d), the SDSF spectra with Δλ of 80 nm and a window size of 11 nm harvested in January 2024 exhibit a more pronounced intensity differential between male-fertility and make-sterility on Peak B and Peak D in comparison to Fig. 5(b). This might contributed to the high accuracy between male- sterility and male-fertility.
The loadings obtained from PCA using SDSF spectra with Δλ of 80 nm and a window size of 11 nm harvested in January 2024 was shown in Fig. 6. The vertical axis represents the loading score, and the horizontal axis represents the emission wavelength. Loading 1, 2 and 3 represent the loadings obtained from PC1, PC2 and PC3, respectively. As illustrated in Fig. 5(c), Fig. 5(d) and Fig. 6, both PC1 and PC2, Loading 1 and Loading 2, were observed to capture Peak C at emission wavelength of around 685 nm. Conversely, Loading 3, given from PC3, have captured both male-sterile and male-fertile features, Peak B and Peak D. This demonstrates that both male-fertile and male-sterile characteristics can be obtained by acquiring SDSF spectra.
The decision boundary of the SVM, as established by SDSF spectra with Δλ of 80 nm and a window size of 11 nm harvested in January 2024, are illustrated in Fig. 7. The x-axis, y-axis and z-axis represent the value of PC1, PC2 and PC3, respectively, and blue and red plots shows actual value of PC1, PC2 and PC3 of male-sterility and male-fertility, respectively. The red surface shows the decision boundary between male-sterility and male-fertility produced by the SVM model.
The results of PCA demonstrated that the explained variance in PC1, 2, and 3 were 95.4%, 3.28%, and 0.49%, respectively. As shown in Fig. 7, the decision boundaries indicate that the classification of male-fertility and male-sterility is predominantly determined by PC1 and PC3. This result suggests that while PC1 realized general classification based on chlorophyll peak information, the incorporation of the specific fluorescence characteristics of male-sterility and male-fertility, which are included in PC3, would contribute more accurate classification.
The confusion matrix under the model constructed by SDSF spectra with Δλ of 80 nm and a window size of 11 nm harvested in January 2024 was shown in Table 4. In Table 4, four male-sterility samples in the test data set were predicted as male-fertility. The SDSF spectra with Δλ of 80 nm and a window size of 11 nm harvested in January 2024 and enlarged view of the emission wavelengths of 385-635 nm in a representative misclassified sample were represented in Fig. 8(a) and (b), respectively. As shown in Fig. 8(a), the fluorescence intensity of the misclassified sample was low in the emission wavelength at around 685 nm which might be derived from chlorophyll. Additionally, in Fig. 8(b), misclassified sample showed relatively small peak of second derivative compared to the average spectra of male-sterility in the Peak D region which may be regarded as distinct peak of male-sterility lignin. The fluorescence characteristics of these misclassified four samples were like those of the male-fertility, which may have resulted in the misclassification.
The objective of this study was to discriminate male- sterility and male- fertility in C. japonica male strobili using fluorescence spectroscopy. After acquiring fluorescence characteristics of C. japonica male strobili cross sections by the excitation emission matrix (EEM), a total of 10 input spectra were generated from the EEM data, and classification models were created by principal component analysis (PCA) and support vector machines (SVM). To select the optimum parameters, the combination of input data and preprocessing parameters which showed the highest F1 score for each input data was determined by all input analysis. As a result, the second derivative synchronous fluorescence (SDSF) spectra with Δλ of 80 nm and a window size of 11 nm harvested in January 2024 was selected, with an F1 score of 98.5%. In this case, the three principal components: PC1, PC2 and PC3 were selected. To verify the accuracy for unknown data, a data splitting analysis was conducted using the SDSF spectra with Δλ of 80 nm and a window size of 11 nm harvested in January 2024. The results demonstrated that when the kernel function was k = 1, the accuracy and precision to the test data were 96.7% and 100%, respectively, representing the highest accuracy. The SDSF spectra was identified to possess the highest accuracy, as it simultaneously captured the distinctive characteristics of male-fertile and male- sterile strobili. These results suggest that fluorescence spectroscopy has the potential for straightforward and precise discrimination of male- sterility and male-fertility in C. japonica. For the future task, a more detailed investigation targeting the samples of different sampling years and genotypes is needed to ensure robust model validation. A cross-annual validation is essential to account for the variability in C. japonica maturation rates across different sampling periods. Additionally, expanding the geographical locations in the validation dataset is necessary as well. Include samples harvested not only in Niigata Prefecture but also in other prefectures would significantly enhance the model’s generalizability and ecological applicability. Although SDSF spectra showed the highest classification accuracy, a more simple and cost-effective measurement system should be designed for the actual application because SDSF spectra needs a synchronous scan of excitation and emission wavelengths.
This study was supported by the Asahi Group Foundation for the Promotion of Science.