2022 Volume 28 Issue 6 Pages 441-452
Strawberries are a high-value fruit with distinctive characteristics, including having a bright red color and juicy texture. The importance of their texture qualities requires the development of non-destructive analytical methods. This study focuses on the use of silicon-based visible–near infrared (Vis-NIR) spectroscopy to predict the texture qualities of strawberries. The highest correlation values (r) of prediction of firmness were 0.81 (transmittance) and 0.78 (reflectance), while those of brittleness were 0.78 (transmittance) and 0.77 (reflectance). It was found that transmittance mode can predict the texture qualities of strawberries better than reflectance mode. Savitzky-Golay filtering improved the prediction accuracy for most characteristics. The results showed that Vis-NIR spectroscopy, combined with partial least square regression analysis and Savitzky-Golay smoothing, can predict the texture qualities of strawberries at moderate to high accuracy. Further studies are needed to reduce the effects of individual sample sizes and improve prediction accuracy.
Strawberries have the distinctive characteristics of bright red color, juicy texture, sweet, sometimes slightly sour taste, and a distinctive aroma. They can be consumed as fresh fruit or prepared into other foods, such as juice, pies, jams, jellies, and cakes, to name a few. Their distinctive aroma is also often used artificially in products, such as perfumes and sanitizers, among others. The important quality characteristics of strawberries are size, productiveness, firmness, flavor, color, and resistance to injuries, such as scratch and puncture damage (Kader, 1991).
Texture is an important quality characteristic of fruits and vegetables, including strawberries. The texture properties of strawberries are related to both palatability and storability. Texture affects mouthfeel, that is, how easy or difficult it is to crush and chew the fruit, and resistance to injury during handling. Texture is also useful for evaluating differences among cultivars (Døving et al., 2005). Two important texture characteristics are firmness and brittleness. These two characteristics and their respective related parameters (i.e., displacement, strain rate, and energy) influence the feel of the fruit in the mouth and are measured together (Furutani et al., 2012; Hirota et al., 2013). Firmness is defined as the first significant peak in a stress test where the force needed to puncture the fruit decreases for the first time. Brittleness is defined as the difference in force between the firmness peak and the next resistance point where the force needed to penetrate the fruit increases again. Factors that affect these two characteristics are fracture deformation, fracture strain rate, fracture energy, brittleness deformation, brittleness strain rate, and brittleness energy. Deformation and strain rate are related to the critical change in the shape of the sample before it breaks, while energy is related to the energy needed to break the sample and form a new crack (Alvarez et al., 2000). Together, these parameters can be used as quality indicators for strawberry texture assessment. While texture parameters are indeed important in determining strawberries' qualities, texture parameters can only be truly measured using destructive methods. Thus, true texture measurements can only be conducted on a few samples in one batch, for example. Non-destructive prediction methods are needed to determine the qualities of individual fruits without loss of products.
Spectroscopy has been used previously to predict the internal characteristics of strawberries and other fruits. Pissard et al. (2013) concluded that near-infrared (NIR) spectroscopy could be used to accurately measure the quality parameters of apples, such as vitamin C, total polyphenol, and sugar content. Another study demonstrated the use of NIR spectroscopy to predict the amount of dry matter and starch in mango (Saranwong et al., 2004) and firmness and sugar content in sweet cherries (Lu, 2001). Camps and Christen used a portable variant of visible-near-infrared (Vis-NIR) spectroscopy to assess the quality of apricot, including soluble solid content, total acidity, and firmness. Several studies have used spectroscopy to predict the internal characteristics of small berries. For strawberries, Vis-NIR spectroscopy was used to predict soluble solid content (Shen et al., 2018; Guo et al., 2013; Sánchez et al., 2012; Shi et al., 2011), acidity (Sánchez et al., 2012; Shao and He, 2008), and firmness (Sánchez et al., 2012; Tallada et al., 2006). It was also used on blueberries to predict total soluble solids (TSS), sugar, organic acid, total anthocyanins (Sinelli et al., 2008; Bai et al., 2014), and hardness (Hu et al., 2018). Studies on predicting texture qualities of strawberries using Vis-NIR spectroscopy are limited. In addition, all of the previously mentioned studies utilized spectrometers based on the InGaAs sensor, except for the study by Shen et al. (2018), which used a combination of silicon and InGaAs sensors. However, the study by Shen et al. (2018) utilized mostly the wavelength in the NIR region captured using the InGaAs sensor and only partially used the spectra from the silicon sensor range. While the InGaAs sensor provides high accuracy, it is costly and its application might not be very realistic in individual farmer situations. Data on the usage of silicon-based spectroscopy for non-destructive analysis of strawberries are limited. Regarding the mode used, reflectance mode is often used in the wavelength range of 1100 nm to 2500 nm on low moisture samples, while the transmittance and interactance modes are often used in the wavelength range of 400 to 1100 nm for intact fruit with a high moisture content (Ito et al., 2004). Reflectance mode may be susceptible to variations in surface properties of the fruits, while transmittance mode is less susceptible to surface variations at the cost of requiring more intensive light.
A popular method for modeling the internal characteristics of fruit using spectral data is partial least square (PLS) regression analysis. Unlike other multivariate methods, the principles of PLS regression try to extract latent variables from both the factors and responses. After the latent variables are extracted, latent variables from the factors are used to predict the latent variables extracted from the responses and then converted back into the responses (Tobias, 1995). Derivation could be applied to the spectral data to remove background noise and improve the resolution and discrimination of components residing in a close wavelength range (Owen, 1995).
A Savitzky-Golay (SG) filter is a digital filter that is used to smooth out data using convolution, a process involving the fitting of successive subsets of adjacent data points with a low-degree polynomial using the linear least-squares method (Savitzky and Golay, 1964). Higher smoothing can be achieved by increasing the convolution size m (the data points used). However, this results in a higher distortion of the data. A balance between noise-removal and minimum distortion of the original spectra should be maintained using the appropriate convolution size.
The selection of the number of components is an important, but still problematic, aspect of PLS regression. Cross-validation methods and randomized test methods have been developed to make this process easier. However, both have their advantages and disadvantages; for example, the cross-validation method does not make use of the entire dataset, and randomized tests are prone to false positives due to the existence of non-significant components in the experimental data. Weighted randomized tests (WRT) were developed by Tran et al. (2017) to tackle the disadvantage of the previously used randomized tests. This test uses weighted vectors of the spectrum instead of the covariance between score vectors and response variables for the null test in assessing the significance of a PLS component. The advantage of the WRT method is two-fold: (1) it is less expensive computationally, and (2) it does not suffer from the presence of irrelevant variance. Based on the study of Tran et al. (2017), WRT also often ends up with a smaller number of components compared to other methods, which further decreases the overall computational cost of executing PLS regression. WRT also has the indirect advantage of being automatic, allowing calculations and model building for multiple parameters to be executed without pausing to choose the number of components.
The objectives of this study were: (1) to confirm the potential of a silicon-based spectroscopy measurement system to predict texture qualities of strawberries, (2) to build appropriate prediction models using PLS regression in combination with SG filtering and WRT, and (3) to investigate the capabilities and limitations of the measurement system.
Strawberry samples The samples used in this experiment are the commercial cultivar Tochiotome strawberries obtained from a farm in Oyama, Tochigi Prefecture, Japan. Two sets of 108 strawberries were picked at 70–100 % maturity, indicated by the red color on the fruits’ surface (Kader, 1991; Sánchez et al., 2012), on April 10th and May 23rd, 2021. The samples were harvested and directly put into individual containers to minimize damage from rubbing between samples and transported to our laboratory to be measured on the same day. All samples were kept at room temperature and measured without using any preservation method until the end of the experiments.
Texture properties measurement The texture characteristics data were captured using a creepmeter (Yamaden REII − 33005C, YAMADEN Co., Ltd., Tokyo, Japan) using a 5-mm flat tip plunger, 1 mm/sec penetration speed, and 10 mm penetration depth (Matsumoto et al., 2010). The samples were cut in half on the long side and measured at the thickest part of the fruit to ensure maximum penetrability. Two cuts were made and used as two repetitions of each sample. The collected characteristics data included firmness (fracture force) (N), fracture deformation (mm), fracture energy (J/m3), fracture strain rate (%), brittleness (N), brittleness deformation (mm), brittleness energy (J/m3), and brittleness strain rate (%).
A typical strain test curve is shown in Fig. 1. The first significant peak force (F1) is firmness, and the difference between the first peak force and the first turning point (F2) is brittleness. The fracture strain rate is the strain rate at the first peak (H1), and the fracture deformation is the strain rate multiplied by sample thickness. Correspondingly, the brittleness strain rate is the strain rate between the first peak and first turning point (H2), and brittleness deformation is the brittleness strain rate multiplied by sample thickness. Fracture energy and brittleness energy are defined as the area under the curve from 0 to H1 and from H1 to H2, respectively. The repeatability coefficients (RC) of texture measurements were calculated using the following formula (Bland and Altman, 1986):
![]() |
Typical strain test result curve.
Spectral data measurement The prototype spectroscopy measurement device was assembled from individual parts, including a light source (Nippon P·I Co. Ltd., PIS-UHX-NIR, Tokyo, Japan), a sensor (Ocean Optics, USB2000+ Modular Spectrometers, Tokyo, Japan), fiber optic cable, USB cable, a compartment functioning as a dark room, and a personal computer with the measurement software installed (Ocean Optics, Spectrasuite, Tokyo, Japan). The spectral data were captured using both transmittance and reflectance methods. Schematic representations of the measurement system are shown in Fig. 2.
Schematic diagram of the measurement system: (A) transmission mode and (B) reflectance mode.
Partial Least Square (PLS) The prediction models were built using the PLS regression (Tobias, 1995). Half of each sample set (54 samples) was used as the calibration and validation sets. The spectrum was pretreated using wavelength limiting to 500–950 nm and derivation (2nd degree) (Owen, 1995) before any further analysis. Additional datasets were generated using an SG filter (Savitzky and Golay, 1964) with 21, 31, 41, and 51 point polynomials. A weighted randomized test was used to select the optimum number of components (Tran et al., 2017). SG smoothing was performed using Microsoft Excel, while PLS models were built using MatLab R2019b and the WRTPLS package (Tran et al., 2017). The results for all SG filter convolution sizes were calculated, and the optimum ones for each parameter were recorded for both methods and experiments. The correlation coefficients (r) between the measured and predicted values and the root mean square error of prediction (RMSEP) of the textural properties were calculated for the calibration and validation sample sets to assess the models.
Captured spectral data Representative examples of spectral curves converted to absorbance using reflectance and transmittance methods are shown in Fig. 3.
Typical spectral curve using reflectance and transmittance method.
The spectrum was derivated (2nd degree) and additional spectra were generated using SG smoothing; typical resulting curves are shown in Fig. 4. The initial spectral curve was rough after the second derivation, with considerable noise obfuscating the trends. SG smoothing was applied to the spectra using different convolution sizes (i.e., 21, 31, 41, and 51 points).
SG filter applied to 2nd derivative spectral curve.
As shown in Fig. 4, the application of this filter smoothed out the rough curves, resulting in smoother and clearer trends. The smoothness of a spectrum increases with a larger convolution size. However, it also distorts the original spectrum. The spectrum with the largest convolution size lost numerous stray peaks, only having several smaller peaks by the end. This is in line with a study by Li et al. (2015), which concluded that the effectiveness of the SG filter is related to the window size m. The optimum number depends on the features of the spectra and user criteria. In qualitative studies where the height of the peaks is directly correlated to the target parameter, the SG filter should be applied carefully so as not to distort the proportion of the peak heights. While no specific wavelengths were targeted in the current study, the recorded transmittance/absorbance values for all wavelengths in the range were used to build the PLS model. Hence, distortion levels that are too severe must be avoided. All SG-treated spectra were used to build and predict the target characteristics. The optimum convolution size for each parameter was recorded and is shown in Table 2.
Texture properties | April | May | ||||||
---|---|---|---|---|---|---|---|---|
Mean | SD | CV | RC | Mean | SD | CV | RC | |
firmness (N) | 2.78 | 1.08 | 0.39 | 1.54 | 1.67 | 0.7 | 0.42 | 1.14 |
fracture deformation (mm) | 1.38 | 0.35 | 0.26 | 0.69 | 0.96 | 0.31 | 0.32 | 0.56 |
fracture strain rate (%) | 9.21 | 2.62 | 0.28 | 4.55 | 6.38 | 2.11 | 0.33 | 4.02 |
fracture energy (J/m3) | 9828.15 | 5902.11 | 0.6 | 8330.71 | 4591.97 | 3155.48 | 0.69 | 5664.90 |
brittleness (N) | 2.23 | 0.86 | 0.39 | 1.54 | 1.38 | 0.54 | 0.39 | 0.89 |
brittleness deformation (mm) | 2.75 | 0.65 | 0.24 | 1.67 | 2.39 | 0.67 | 0.28 | 1.71 |
brittleness strain rate (%) | 18.24 | 4.24 | 0.23 | 11.58 | 15.89 | 4.61 | 0.29 | 11.99 |
brittleness energy (J/m3) | 12902.91 | 6403.69 | 0.5 | 10198.28 | 6897.9 | 4146.6 | 0.6 | 8106.50 |
Texture property | April | May | ||
---|---|---|---|---|
Reflectance | Transmittance | Reflectance | Transmittance | |
firmness (N) | ||||
no. of components | 3 | 4 | 4 | 6 |
rc | 0.77 | 0.73 | 0.80 | 0.88 |
rv | 0.65 | 0.73 | 0.78 | 0.81 |
RMSEP | 0.70 | 0.66 | 0.43 | 0.41 |
SG filter m | 51 | 41 | 21 | 51 |
fracture deformation (mm) | ||||
no. of components | 3 | 4 | 3 | 3 |
rc | 0.72 | 0.72 | 0.70 | 0.72 |
rv | 0.60 | 0.68 | 0.71 | 0.77 |
RMSEP | 0.22 | 0.20 | 0.20 | 0.19 |
SG filter m | 41 | 31 | 51 | 21 |
fracture strain rate (%) | ||||
no. of components | 4 | 5 | 3 | 3 |
rc | 0.72 | 0.75 | 0.67 | 0.69 |
rv | 0.58 | 0.64 | 0.67 | 0.73 |
RMSEP | 1.90 | 1.86 | 1.48 | 1.38 |
SG filter m | 21 | 51 | 51 | 51 |
fracture energy (J/m3) | ||||
no. of components | 3 | 4 | 3 | 6 |
rc | 0.72 | 0.71 | 0.75 | 0.85 |
rv | 0.64 | 0.69 | 0.71 | 0.76 |
RMSEP | 3829.07 | 3703.64 | 2210.91 | 2060.68 |
SG filter m | 51 | 51 | 51 | 51 |
brittleness (N) | ||||
no. of components | 3 | 4 | 4 | 6 |
rc | 0.73 | 0.71 | 0.77 | 0.87 |
rv | 0.60 | 0.71 | 0.77 | 0.78 |
RMSEP | 0.57 | 0.54 | 0.33 | 0.33 |
SG filter m | 31 | 51 | 21 | 51 |
brittleness deformation (mm) | ||||
no. of components | 5 | 4 | 3 | 3 |
rc | 0.42 | 0.64 | 0.47 | 0.41 |
rv | 0.06 | 0.31 | 0.26 | 0.44 |
RMSEP | 0.52 | 0.48 | 0.52 | 0.47 |
SG filter m | 21 | 51 | 51 | noSG |
brittleness strain rate (%) | ||||
no. of components | 3 | 4 | 3 | 3 |
rc | 0.35 | 0.46 | 0.47 | 0.39 |
rv | 0.13 | 0.31 | 0.22 | 0.40 |
RMSEP | 3.13 | 3.13 | 3.72 | 3.37 |
SG filter m | 51 | 21 | 51 | noSG |
brittleness energy (J/m3) | ||||
ncomp | 3 | 3 | 4 | 2 |
rc | 0.72 | 0.67 | 0.77 | 0.69 |
rv | 0.64 | 0.72 | 0.75 | 0.74 |
RMSEP | 4334.63 | 4317.02 | 2575.02 | 2486.83 |
SG filter m | no SG | 31 | 21 | 51 |
Texture measurements The measured texture parameters are shown in Table 1. The higher values for the texture characteristics suggested harder strawberries in April compared to May. The April experiments spanned a wider range of measured values as indicated by the higher standard deviation. The repeatability coefficient considers only the difference between the repetitions of each sample's measurements. However, the values are noticeably higher in the April experiments. Since the measurements were done on two halves of each sample, this result may suggest that the strawberries in April have more variations within each individual fruit compared to May samples. These differences between both sample sets might affect the prediction results, as will be discussed in the next section.
General PLS results For each characteristic, method of spectrum capture, and SG-filter convolution size, a model was built using PLS regression to predict the target texture properties, which included firmness (N), fracture load (mm), fracture strain rate (%), fracture energy (J/m3), brittleness (N), brittleness load (mm), brittleness strain rate (%), and brittleness energy (J/m3). The optimum number of components was selected using the WRT. The correlations (r) between the measured and predicted texture characteristics were calculated for both calibration (rc) and validation (rv) sample sets, and the results are shown in Table 2. The optimum SG convolution size m was selected based on the highest rv values.
The results obtained for all parameters, except brittleness deformation and brittleness strain rate, appeared to be well suited for prediction using this approach, judging from their rv values, especially in the May experiments. The correlations for the calibration set (rc) values were understandably higher, as the calibration sets were used to build the models themselves. Regarding firmness, the rv values from the May experiments are 0.78 and 0.81 for the reflectance and the transmittance methods, with RMSEP values of 0.43 and 0.41, respectively. The rv values showed a strong positive correlation between the measured and predicted values (r > 0.7). The prediction plot is shown in Fig. 5. As shown in the figure, there was a relatively clear trend between the measured and predicted values, with a few anomalies.
prediction of (A) firmness (B) brittleness using transmittance method in May experiment
The results from the April experiments were slightly worse, with rv = 0.65 and RMSEP = 0.7 obtained for the reflectance method, and rv = 0.73 and RMSEP = 0.66 obtained for the transmittance method. The correlation values still fall in the moderately strong range (0.4 > r > 0.7) and the higher RMSEP values suggested worse prediction compared to the May results. Similar results were observed for brittleness in the May and April experiments, with rv = 0.77, RMSEP = 0.33 obtained using the reflectance method, and rv = 0.78 and RMSEP = 0.33 obtained using the transmittance method in May, and rv = 0.60 and RMSEP = 0.57 obtained using the reflectance method, and rv = 0.66 and RMSEP = 0.54 obtained using the transmittance method in April. These values fall within the same categorization as those of firmness. Also similar to the firmness results, the RMSEP in the May experiments were lower compared to those in the April experiments, indicating smaller error and more accurate prediction. In comparison, the study by Tallada et al. (2006) predicted firmness with correlation values of 0.599 for strawberries in the 70% to fully ripe strawberry group, and 0.786 in the 50% to fully ripe strawberry group. Sánchez et al. (2012), using NIR spectroscopy (1 600–2 400 nm range), obtained r2 values of 0.35 with MPLS regression and 0.48 with a LOCAL algorithm, which convert to r values of 0.59 and 0.69, respectively. While there is not enough previous information regarding the mechanism of the relationship between strawberries' texture and its spectral absorption, both of those parameters are related to the pigments in strawberry fruits. Pigments, namely chlorophyll and anthocyanin in strawberries, absorb light at certain wavelength ranges, namely 535 nm (Scott, 2013; Jiang et al., 2016; Yue et al., 2020; Blanke, 2000). The concentration of these pigments would change the color of the fruit (Lancaster et al., 1997), and in effect, the absorption spectrum. Given et al. (1988) and Blanke (2000) noted the change in the pigment contents and firmness as strawberry fruits ripen. As demonstrated in this study, spectroscopy methods can be used to predict the texture parameters of strawberries, possibly due to the complex inter-relationship between the ripeness, pigment contents, fruit color, and firmness of strawberry fruits. The better results in the May experiments were possibly caused by the lower individual sample size variation (Fig. 6). Due to the nature of our prototype spectroscopy measurement device, all samples were measured from the same height and distance to the center of the fruit. This means that different-size fruits might have different angles to which the light touches the sample and different amount of background noise. This could have led to less-accurate spectra and prediction. Other than the spectral measurement, the repeatability coefficients from the April experiments indicated the higher differences between same-sample measurements in the creepmeter test. In this study, the samples were cut in half, so higher same-sample variations could mean that there are more variations in the texture qualities within one fruit. This could also lead to worse prediction accuracies in the April experiments.
The weight distribution of strawberry samples in April and May experiments
These results suggest that Vis-NIR spectroscopy, along with PLS regression with additional SG filtering and WRT used in the current study, showed higher correlation values (May experiments) compared to the three-wavelengths method used by Tallada et al. (2006) and MPLS regression combined with the LOCAL algorithm used by Sánchez et al. (2012). Obviously, the difference in experimental methods and measuring conditions would affect the results, and there are more factors to judge the goodness of prediction methods other than their correlation values. However, it is particularly notable that hyperspectral imaging and InGaAs-based NIR spectroscopy were more commonly used with strawberries.
Other parameters relating to firmness (fracture deformation, fracture strain rate, and fracture energy) and one parameter relating to brittleness (brittleness energy) showed similar results to firmness and brittleness, with rv showing strong correlations in the May experiments and moderate correlation in the April experiments, and lower RMSEP obtained in the May experiments. The resulting correlation and RMSEP results from the transmittance method are also better than those obtained using the reflectance method, which is the same as the results obtained for firmness and brittleness. While these characteristics are seldom cited compared to firmness itself, they still hold value in explaining the texture of strawberries and determining mouthfeel. According to Kohyama et al. (2013), fracture strain rate along with apparent modulus can be used as an indicator of storage quality of strawberry fruit as the fracture strain increases as the fruit softens in storage. Fracture deformation and strain rate are related to the crack length that the fruit can take before fracture and their ability to resist penetration, and fracture energy is related to the critical energy needed to make a new fracture (Alvarez et al., 2000). The strong prediction correlation values observed in this study demonstrated the potential of the Vis-NIR method for predicting fracture deformation, fracture strain rate, fracture energy, and brittleness energy. As the computational cost of this method was not very high due to using a relatively low number of components, it is not unrealistic to calculate and predict these properties in conjunction with firmness and brittleness for a more accurate and detailed depiction of the textural characteristics of strawberries.
For other parameters relating to brittleness (i.e., brittleness deformation and brittleness strain rate), the correlation values were very low, and only the results of the transmittance method from the May experiments reached a low to moderate correlation; that is, rv = 0.44 and RMSEP = 0.47 were obtained for brittleness deformation, and rv = 0.40 and RMSEP 3.37 were obtained for brittleness strain rate. The other models resulted in low correlations. A possible cause for this behavior is the lower coefficient of variation of deformation and strain rate on the brittleness side of the curve compared to the firmness side. As shown in Table 1, the coefficient of variation, also known as relative standard deviation, of the brittleness deformation and brittleness strain rate was relatively lower than that of fracture deformation, fracture strain rate, and the other parameters. This means that the measured data are too close together and the resulting regression model could not accurately predict these characteristics. This could also lead to issues where the regression models built using one subset of the sample (calibration set) cannot be used to predict the other subset (validation set), also known as overfitting (Harrell, 2015). The higher correlations obtained using the calibration set (rc), with most rc values for these characteristics falling in the moderate range, supported this observation. One possible remedy for building a better model would be to use samples with more intrinsic variations; for example, having samples with wider, but more controlled sample ripeness levels. However, the variations in individual sample shape and size and their impacts on spectrum measurements should also be considered so as not to repeat the suboptimal results obtained in the April experiments.
For all parameters, the results from transmittance are better than reflectance. This result could be caused by measurements in transmittance mode picking up not only the spectral information from the surface of the fruit but also from inside the fruit. According to Lin and Ying (2009), reflectance mode captures the light reflected by the surface or shallow internal layers of the measured fruits and might not give enough information regarding the internal properties of fruits as opposed to transmittance mode, which captured the spectrum after it has passed through the fruit interior. This could be the reason for the higher prediction accuracy of texture qualities using the transmittance mode in this study. Other studies cited the opposite result for blueberries (Leiva-Valenzuela et al., 2014) and kiwi fruit (Schaare and Fraser, 2000), where reflectance mode resulted in better prediction than transmission mode. However, Leiva-Valenzuela et al. (2014) reported that reflectance mode results varied depending on the orientation of the fruit during measurements. As our study only measured strawberries in one orientation and used a single fixed position for all samples regardless of size, this could have adversely affected the reflectance measurement results in the current study. Schaare and Fraser (2000) also reported that while transmission mode is expected to yield better results, the small amount of transmitted light resulted in a higher signal-to-noise ratio, thus decreasing its accuracy. In the current study, the light source was adjusted for optimum light capture using actual strawberry fruit before the experiment to improve the signal-to-noise ratio. The different opacity between strawberry and kiwi fruit could also be a factor contributing to the better results obtained using the transmission mode in the current study. Another possible explanation is the effect of surface variation, which is more apparent when using the reflectance method as opposed to the transmittance method (Ito et al., 2004). Regarding surface variation, damage on the strawberry surface could also affect the prediction results. The samples in this experiment were harvested, transported, and measured on the same day in the span of several hours for each experiment. While there was little to no contact between samples, there was still contact between the samples and the containers and covers. While surface damage was kept to a minimum, any rubbing between strawberries and the environment before and after harvesting could have an impact on the surface and in effect, the prediction results.
Savitzky-Golay smoothing Regarding the convolution size of the SG filter, the SG filter yielded higher rv values for most characteristics. The best convolution size varies; however, more than half of all optimum results were generated using a convolution size of 51 points. The smoothing of the data using the SG filter was beneficial for the prediction model, and the larger convolution size, in this case, 51 points, provided better prediction. This suggests that the removal of noise was more beneficial for the predictions despite removing more information from the original curve. This result corroborates studies by Pissard et al. (2013) and Liu et al. (2010), who showed that the SG filter in combination with derivation yielded the best results. While smoothing using 51 points may seem large, according to Madden (1978), the smoothing point can reach above 25 points in some cases. In this study, the optimum SG filter convolution size was selected using only the rv values as an indicator. Sturdier methods to assess the optimum convolution size should be considered in future studies to improve the prediction results and computational costs.
Weight randomized test The number of components was selected using WRT. From Table 1, the optimal number of components resulting from this method are mostly 3–4 components and 6 for some parameters, namely firmness, fracture energy, and brittleness. These values are relatively low and do not require a lot of computing power and time to calculate all parameters once the number of components has been determined. In PLS regression, most of the time was used by WRT itself rather than the actual model building. This would not be an issue in the actual application, as WRT is only needed during the development of the models and not during application. However, the most significant advantage of WRT is the ability to select the number of components automatically, as opposed to the cross-validation method, which still requires an observer to check the root mean square error curve and select the number of components manually (Tran et al., 2017). In this study, WRT noticeably accelerated model building as it allowed all parameters to be calculated using a single script without pause.
Impact of individual sample size variation The suboptimal results obtained for the April experiments compared to that of the May experiments were most likely caused by the higher variance in strawberry sizes in the April experiments. The sample weight in the April experiments was 18.48±3.49 g, while that in the May experiments was 19.32±2.51 g. Fig. 6 shows the distribution of strawberry sample weight in the April and May experiments. The weights of strawberry samples in the April experiments were noticeably more varied, especially on the smaller end.
The samples were sorted by size and split into three groups (S, M, and L). No specific criteria was used to sort samples into each group, but rather the samples were sorted by weight and split equally, samples 1 to 36 were put into group S, samples 37 to 72 into group M, and samples 73 to 108 into group L. PLS regressions were conducted on each group separately to confirm the impact of individual fruit size. Due to the small number of samples in each group, each group was split into 30 calibration samples and 6 validation samples, as opposed to a 50:50 split in the full set experiment. Due to this method of splitting, there was variance in the weight between each group, but it was still lower than that of the full set.
The PLS results are shown in Table 4. For all characteristics, there are instances where the correlation values are higher and RMSEP in one of the individual groups compared to the full sample set. For example, rv values of firmness and brittleness in April, transmittance, and M group are higher than the full set while that of the S and L groups are lower than the full set, suggesting a bias relating to the size of the samples. As the samples were grouped by sorting and splitting, rather than by using specific weight criteria, each group had a variable weight range. The different predictability shown here also suggested a bias relating to the variation in sample sizes. Due to the nature of our prototype spectroscopy measurement device, all measurements were conducted at fixed positions and distances for all samples. This could have introduced noise from ambient sources, which fluctuated with the individual sample size and shape. Thus, the sample set from the April experiments, which had more size variation, was more deeply affected by this noise, resulting in less-than-ideal predictions. The May results were also affected by the grouping, resulting in higher correlation in smaller groups for several parameters. However, they generally still yield better results than the April experiments, possibly due to the smaller variation in the individual sample size in each group, thus less variations in the angle of spectra capture and background noise. Comparison of the spectrum capture methods revealed that the reflectance results were more affected by the grouping, as indicated by the bigger change in correlation values and RMSEP, both positively and negatively. This corroborates the findings of Leiva-Valenzuela et al. (2014), which concluded that the reflectance mode is more dependent on sample orientation and size compared to the transmittance mode. Lin and Ying (2009) also concluded that spectroscopy measurements using whole fruits are highly dependent on the size of individual fruits and that the spectroscopy measurement method should be designed carefully, depending on the fruit size, thickness of skin, and the attributes of being tested. The correlation values shown in Table 3 cannot be used as a prediction indicator due to the small number of samples in each group, especially the validation subset. However, these results suggest that the fruit size should be carefully considered in future experiments to avoid such bias.
Group | April | May | ||||
---|---|---|---|---|---|---|
Mean | SD | CV | Mean | SD | CV | |
full | 18.48 | 3.49 | 0.19 | 19.33 | 2.51 | 0.13 |
S | 14.56 | 2.12 | 0.15 | 16.55 | 1.04 | 0.06 |
M | 18.72 | 0.99 | 0.05 | 19.19 | 0.70 | 0.04 |
L | 22.17 | 1.32 | 0.06 | 22.23 | 1.02 | 0.05 |
Texture property | April | May | |||||||
---|---|---|---|---|---|---|---|---|---|
Reflectance | Transmittance | Reflectance | Transmittance | ||||||
r | RMSEP | r | RMSEP | r | RMSEP | r | RMSEP | ||
firmness (N) | full | 0.65 | 0.70 | 0.73 | 0.66 | 0.78 | 0.43 | 0.81 | 0.41 |
S | 0.66 | 1.24 | 0.64 | 1.43 | 0.26 | 0.40 | 0.77 | 0.41 | |
M | 0.76 | 0.49 | 0.83 | 0.34 | 0.48 | 0.59 | 0.75 | 0.46 | |
L | 0.85 | 0.96 | 0.60 | 1.23 | −0.28 | 0.47 | 0.86 | 0.35 | |
fracture | full | 0.60 | 0.22 | 0.68 | 0.20 | 0.71 | 0.20 | 0.77 | 0.19 |
deformation | S | 0.44 | 0.38 | 0.75 | 0.35 | 0.60 | 0.14 | 0.25 | 0.21 |
(mm) | M | 0.37 | 0.17 | 0.72 | 0.12 | 0.08 | 0.25 | 0.72 | 0.17 |
L | 0.92 | 0.30 | 0.53 | 0.43 | −0.11 | 0.12 | 0.62 | 0.26 | |
fracture strain | full | 0.58 | 1.90 | 0.64 | 1.86 | 0.67 | 1.48 | 0.73 | 1.38 |
rate (%) | S | 0.46 | 2.93 | 0.66 | 2.93 | 0.58 | 0.88 | 0.23 | 1.62 |
M | 0.03 | 1.40 | 0.35 | 1.19 | 0.09 | 1.51 | 0.68 | 1.17 | |
L | 0.93 | 1.26 | 0.65 | 2.02 | 0.01 | 1.00 | 0.66 | 2.00 | |
fracture energy | full | 0.64 | 3829.07 | 0.69 | 3703.64 | 0.71 | 2210.91 | 0.76 | 2060.68 |
(J/m^3) | S | 0.62 | 7059.53 | 0.55 | 8519.32 | 0.48 | 1443.58 | 0.50 | 2220.98 |
M | 0.65 | 2422.71 | 0.86 | 2007.59 | 0.41 | 2406.35 | 0.54 | 2000.39 | |
L | 0.91 | 4100.04 | 0.60 | 5766.97 | −0.17 | 1526.27 | 0.78 | 1704.75 | |
brittleness (N) | full | 0.60 | 0.57 | 0.66 | 0.54 | 0.77 | 0.33 | 0.78 | 0.33 |
S | 0.64 | 0.84 | 0.72 | 0.86 | 0.36 | 0.34 | 0.79 | 0.38 | |
M | 0.87 | 0.34 | 0.84 | 0.32 | 0.39 | 0.48 | 0.70 | 0.40 | |
L | 0.80 | 0.80 | 0.58 | 0.95 | −0.32 | 0.38 | 0.92 | 0.22 | |
brittleness | full | 0.06 | 0.52 | 0.31 | 0.48 | 0.26 | 0.52 | 0.44 | 0.48 |
deformation | S | 0.42 | 0.41 | 0.02 | 0.39 | 0.35 | 0.70 | −0.23 | 0.67 |
(mm) | M | 0.59 | 0.56 | 0.60 | 0.62 | 0.45 | 0.38 | −0.78 | 0.46 |
L | 0.08 | 0.65 | 0.92 | 0.45 | −0.04 | 0.75 | 0.66 | 0.50 | |
brittleness strain | full | 0.13 | 3.13 | 0.28 | 3.13 | 0.22 | 3.72 | 0.40 | 3.45 |
rate (%) | S | 0.69 | 2.37 | 0.20 | 2.85 | 0.54 | 4.18 | −0.30 | 4.51 |
M | 0.49 | 3.77 | 0.64 | 4.11 | 0.58 | 2.00 | −0.56 | 2.36 | |
L | 0.03 | 3.88 | 0.87 | 2.02 | 0.04 | 6.42 | 0.61 | 3.82 | |
brittleness | full | 0.64 | 4334.63 | 0.72 | 4317.02 | 0.76 | 3043.43 | 0.74 | 2486.83 |
energy (J/m^3) | S | 0.65 | 6984.42 | 0.62 | 7559.58 | 0.37 | 2804.68 | 0.46 | 4221.15 |
M | 0.43 | 4283.32 | 0.85 | 2499.89 | 0.82 | 1958.46 | 0.76 | 1800.18 | |
L | 0.82 | 5244.38 | 0.67 | 6017.64 | −0.06 | 2165.95 | 0.87 | 1429.95 |
Silicon-based spectroscopy has the potential to be used to predict strawberry texture qualities, namely firmness, fracture deformation, fracture strain rate, fracture energy, brittleness, and brittleness energy, as suggested by the high correlation values (rv) in the May experiments and moderate correlation values (rv) in the April experiments. The current method cannot sufficiently predict brittleness deformation and brittleness strain rate due to insufficient variation in both characteristics compared to other characteristics. In both the April and May experiments, prediction models using the transmittance mode yielded better predictions than the reflectance method, indicated by the higher rv values and lower RMSEP, possibly due to the transmittance method picking up the internal spectral information as well as the surface information and the reflectance method being more affected by the size and orientation of the sample. The smoothing of spectral curves using SG filtering improved the prediction results at varying convolution sizes, with more than half of all measurements benefitting from higher (51 points) convolution size. Weight-randomized tests could be used as an automatic method for selecting the number of components. Grouping samples by weight suggested a bias in measurements relating to the individual sample weight and size. This bias is a possible cause for suboptimal results obtained from the April experiments. The variation in the sample size and measurement conditions should be carefully considered in future studies.
Conflict of interest There are no conflicts of interest to declare.