Analysis of Grouped Growth Patterns in Even-Aged Sugi Forest Stand within the Framework of Mixture Model

: Even in the same even-aged forest stand, trees grow differently due to their own growth capacity, relative spatial conditions, and other environmental factors. In this paper, we use a Gaussian mixture model (GMM) to investigate growth patterns in an even-aged forest to identify groups of trees with the same growth characteristics. Given the Richards growth function for the growth process, the estimated coefficients are used as new response variables in a multivariate GMM to identify growth patterns. The optimal number of grouped growth patterns is searched by minimizing the cross-validation (CV) criterion. We demonstrate our proposed method using growth data from a sugi (Cryptomeria japonica) sample plot in Hoshino Village, Fukuoka Prefecture, Japan. The model identified three separate growth patterns in our sample plot.


Introduction
Even in the same even-aged forest stand, trees grow differently because of varying growth capacity, relative spatial conditions, and other environmental factors.
From a management perspective, if these growth differences could be captured, it would be beneficial to consider them in growth prediction. In this paper, we use a Gaussian mixture model (GMM; e.g., Everitt and Hand, 1981) to investigate differences in the growth patterns found in an even-aged forest stand.
A clustering method through GMM has been widely used in several research fields, including forestry (e.g., Zhang et al., 2004). Because tree growth data are repeated measures, the time interval for measurement can becomes unequal and the number of measures becomes large. Data collected over an unequal time interval is called "unbalanced data", and data with a large number of data points is "high dimensional data". Although a multivariate GMM (MGMM) is preferred for clustering, it is inefficient for unbalanced and high dimensional data mainly because an estimated covariance matrix in MGMM can become singular or sometimes nonobtainable for the unbalanced or high dimensional data.
To overcome the difficulties associated with unbalanced and high dimensional data, we reduce and equate the dimension of all data by assuming the Richards growth function (Richards, 1958) for the target growth data (see Yanagihara and Yoshimoto, 2005a;Yoshimoto et al., 2005). In this approach, for the classification of growth patterns, the coefficients derived from the Richards growth function are regarded as new response variables in the MGMM. Since a cluster for each individual is unknown, we apply the expectation-maximization (EM) algorithm proposed by Dempster et al. (1977) in order to have the maximum likelihood (ML) estimators of unknown parameters in the MGMM. The optimal number of grouped growth patterns is searched by minimizing the cross-validation (CV) criterion proposed by Stone (1974). The number of sampled trees is not so large; thus other information criteria are inappropriate.
The paper is organized as follows. In the next section, we explain our classification method through MGMM and the method used to determine the optimal number of clusters by minimizing the information criterion, followed by an explanation of MGMM applied to the classification of growth patterns. In the third section, we demonstrate our proposed method using data obtained from a forest stand owned by Hoshino Village in Fukuoka Prefecture, Kyushu, Japan.

Classification through MGMM
Let 1 ( ,..., ) be a p×1 observation vector of the i-th individual (i=1,…,n), where 1 ,..., n x x n are mutually independent, n is the sample size, and "′" denotes a transposition of the matrix or vector. Suppose that each individual belongs to one of k populations (or clusters) 1 ,..., k Π Π and the distribution of j Π (j=1,…,k) is the p-dimensional normal distribution with the mean j μ and the variance-covariance matrix Σ . It is well known that a probability density function for the case of z∼ ( , ) p j N μ Σ is given by Since we do not know which population the individual belongs to in general, we treat the unknown partition of clusters as a k×1 random variable vector δ. Here we assume that δ distributes according to the k-dimensional multinomial distribution independent random vectors from δ, where 1 ( ,..., ) is an index vector denoting the population for the i-th individual. The following conditional probability density function of x i given i δ (i=1,…,n) is derived: However, we cannot use the full likelihood for estimating ρ, Ξ and Σ because δ i is unobserved. Thus, we apply the EM algorithm for estimating unknown parameters.
The EM algorithm is widely used algorithm for obtaining the ML estimates from incomplete data by regarding δ i as missing values. The following is an explanation of the estimation steps of the algorithm for the ML estimates:

The EM Algorithm for Obtaining ML Estimates in MGMM
Setp 1. We begin by determining the initial clusters for the observations. In our algorithm, this is specified by the k-means method (MacQueen, 1967) using a determinant as the cluster criterion. For a detailed description of k-means using the determinant, refer to Yanagihara and Yoshimoto (2005a). Let 1, ..., n δ δ denote the given initial partition and the corresponding matrix be . Then the initial mixing ratio, location and covariance matrices are determined as follows: where X is an n×k matrix given by 1 ( ,..., ) n ′ = X x x , n I is the n-th unit matrix and n 1 is an n×1 vector with all elements equal to 1. For simplicity, we write (p+1)(k+p/2)×1 vector stacking estimates of ρ, Ξ and Σ as transforms a matrix to a vector by stacking the first to the last column sequentially, and the vech(B) operator transforms the lower triangle matrix of the symmetric matrix to a vector by stacking the first to the last column (see Harville, 1997).
Step 2. (Expectation-Step; E-Step): In the m-th repetition we have ( ) Σ , which are estimates of ρ, Ξ and Σ , respectively. We set ( ) Let ( ) m j w denote an estimated posterior probability in the m-th repetition by Then we calculate the following conditional expectation: Step 3. (Maximization-Step; E-Step): By maximizing the conditional expectation [9], we obtain estimates of ρ, Ξ and Σ in the (m+1)-th repetition. Since ρ has the restriction 1 1 k ρ ρ + + = , the estimates of ρ, Ξ and Σ in the (m+1)-th repetition are obtained by maximizing the following Lagrange function: [10] becomes an optimal solution, the following equations must be satisfied: O are p×1 vector and p×p matrix with all elements equal to 0, respectively. From the above equations, the Lagrange multiplier λ becomes −n. Therefore, estimates of ρ, Ξ and Σ in the (m+1)-th repetition are given by Step 4. We repeat Steps 2 and 3 if as an optimal solution for θ to maximize the marginal log- Here a is any given tolerance for convergence.
Expressing the optimal θ by ˆ( , vec( ) , vech( ) ) ′ ′ ′′ = θ ρ Ξ Σ , we calculate the following estimated posterior probability: We assign the i-th individual to the population under the highest ˆi j w (j=1,…,k), i.e., where * E X and * E U are the expectation with respect to X and U, respectively, and f is the marginal density given by [4].
By obtaining an unbiased estimator of R k , we can correctly evaluate the discrepancy between the data and the model. The simplest estimator of k R is the sample KL discrepancy function by . Note that the sample KL discrepancy function underestimates k R generally, so that an information criterion by ˆ2 is B is a consistent estimator of the bias given by Akaike (1973Akaike ( , 1974 evaluated k B by "2 times the number of independent parameters" and proposed Akaike's information criterion (AIC) by adding the evaluated k B to the sample KL discrepancy function, i.e., In the above evaluation, if f is not equal to ϕ in [16], AIC has a constant bias. This is because Akaike derived AIC only under the assumption that ϕ and f are equal. Takeuchi (1976) revaluated the bias correction term of AIC, (p+1)(2k+p)−2, under the inconsistency with ϕ and f, and proposed Takeuchi's information criterion (TIC) by his revaluation.
In contrast to AIC and TIC, Stone (1974) proposed the CV criterion in the be the MLEs of ρ, Ξ and Σ , which are evaluated from such a sample set consisting of X without its i-th row vector i x .
The CV criterion is given by [21] { } . Stone (1977) pointed out that the CV criterion is an asymptotically unbiased estimator for the risk [17]. From Stone's (1977) results, we can see that TIC and the CV criterion are asymptotically equivalent, i.e., . Therefore, the order of bias of CV is equal to that of TIC. Through an investigation of the asymptotic expansions of biases for the risk [17], Yanagihara (2006) demonstrated that the CV criterion in normal regression models has smaller bias than TIC. That is, in order to obtain TIC, we must estimate higher-order cumulants, of which the ordinary estimators tend to underestimate too much even for a moderate sample size. As a result, TIC tends to have a large bias. By contrast, we can obtain the CV criterion without estimating higher-order cumulants. Thus, CV is more efficient than TIC and AIC when the sample size is not too large. Therefore, we use CV to select the best number of clusters in this paper. We used the following procedure to determine the best number of clusters:

The Algorithm to Determine the Number of Clusters
Step 1. We determine the maximum number of clusters K.
Step 2. We calculate CV(k), where k is the number of clusters.
Step 3. We search for k providing the smallest value of CV as the optimal number of clusters opt k , i.e., opt 1,..., arg min CV( )

Application of MGMM to classification of tree growth patterns
Let il y be observation from the i-th individual tree at the l-th time il t (i=1,…,n; is a is an unbalanced and high dimensional data, the generalized non-linear mixed effect model proposed by Vonesh and Carter (1992) might be preferable to i y for the growth analysis. The non-linear mixed effect model used for the forest growth analysis can be found in Fang and Bailey (2001), Hall and Bailey (2001), Garber and Maguire (2003), Leites and Robinson (2004), Yanagihara et al. (2004), and Yanagihara and Yoshimoto (2005b). When applying the non-linear mixed effect model to unbalanced and high dimensional data, the model is computationally and practically difficult to extend for MGMM. To overcome this difficulty, we use the estimated β in each tree as the response variables in MGMM, as in Yanagihara and Yoshimoto (2005a), and Yoshimoto et al. The growth patterns can be classified only through the estimated parameters, ˆi β , which control the shape of growth curves for i y . In our analysis, we use the Richards growth function, i.e.,

Numerical Example
We used growth data obtained from a survey conducted at Hoshino Village of Fukuoka Prefecture in Kyushu, Japan. The data consists of growth measurements from thirty trees. The study plot is shown in Figure 1. There were 136 trees in the study plot was 136. In Figure 1 Figure 2 shows real growth data of DBH, height, and volume. Figure 3 shows scatter plots of the estimated coefficients of the Richards growth function. In these figures, the number also denotes the ID number of each tree. These measurements applear to reveal three clusters in DBH data, no clusters in height data, and two clusters in volume data. Table 1 gives values of AIC and CV in each candidate model when the maximum number of clusters K is 4. In the table, the smallest value in each information criterion is marked in bold. The table implies that based on CV and AIC, 3 and 4 are chosen as the optimal number of clusters, respectively, in all growth data. Based on these results, it is more likely that AIC overestimates the number of clusters in MGMM when the sample size is not so large. Therefore, CV is recommended for determining the number of clusters in MGMM when the sample size is not so large. Figure 4 shows the optimal cluster partition chosen by the CV criterion. The optimal number of clusters in each growth data was 3. In the figures, , and denote the data belonging to clusters 1, 2, and 3, respectively, and the curved lines are contour lines of probabilities based on the fitted marginal density function f from equation [4]. The darker the color of a contour line, the lower the probability is. Table 2 shows estimated posterior probabilities of all growth data. The highest probability in each tree is marked in bold, and the tree belongs to the corresponding cluster. From the figures and table, we can see that the cluster partition is slightly different depending upon the growth data. Comparing the clusters for DBH with the clusters for volume, tree numbers 52, 83, 107 and 127 did not match. As for the comparison of DBH with height, tree numbers 7, 12, 15, 31, 37 and 127 did not match. If the difference in the growth pattern were caused by differences in the trees themselves, we would expect all cluster partitions to be the same (or very similar). Therefore, we can expect that trees in the study plot are not different in themselves.
Because almost all the highest estimated posterior probabilities of the growth data were very close to 1, we can conclude that the resultant clusters are clearly divided.
The highest posterior of tree number 83 in the volume data was only slightly lower than expected. Essentially, this was because the data was on the middle of the centers of clusters 2 and 3. Moreover, from Figure 4, three peaks were clearly observable in the fitted density function of the DBH data. As a result, the DBH data was divided into three clusters. Not so clear as the DBH data, we observed three peaks in the fitted density function of the volume data. However, there were only two peaks in the fitted density function of the height data, even though the analysis concluded three clusters, suggesting that the distribution of x i would not be normal. and ■ denote the sampled trees belonging to clusters 1, 2, and 3, respectively. From the figure, trees belonging to cluster 3 were observed around the central part of the study plot. This could imply that the difference in growth pattern is caused by geographical and topographical factors.

Discussion and Conclusion
In this paper, we utilized GMM to identify growth patterns in an even-aged forest stand. Assuming the Richards growth function for the growth data, the estimated coefficients of the function were used as the response variables of MGMM. We chose the number of clusters by minimizing the CV criterion.
Applying the proposed method to growth data from a sugi plantation forest, we identified that there were three growth patterns in the study plot.
The k-means method is a well-known clustering method widely used in several research fields. This method appears to have an advantage over the clustering method through MGMM proposed here because it can more easily obtain the cluster partition. By using the k-means method, Yanagihara and Yoshimoto (2005a) analyzed the growth pattern of the same stem volume data used in this paper. The clustering partition they obtained was very similar to our results: however, there is a disadvantage to this method. When the number of clusters is searched by the kmeans method, the information criterion must be defined under the assumption that the cluster partition is explicitly assigned. In other words, the cluster partition to be searched is not treated as a random variable. As a result, the problem of choosing the number of clusters is replaced with the problem of choosing the number of groups in multivariate analysis of variance (MANOVA) models. Therefore, in MANOVA models, AIC, TIC, or CV is used to select the number of clusters when the k-means method is used for the cluster analysis. However, such an information criterion does not work well for selecting the number of clusters. One reason for this is that the bias-correcting term in the information criterion is underestimated from the actual value when the MANOVA model is applied to data partitioned by the k-means method. Yanagihara and Yoshimoto (2005a) placed a burden on the bias-correcting term by assuming heteroscedasticity in the MANOVA models.
Although such a criterion worked well in their paper, there is no theoretical guarantee. The clustering method through MGMM does not have such a disadvantage. Moreover, the clustering method through MGMM offers the opportunity to obtain the estimated posterior probabilities as in Table 2, or the contour lines as in Figure 4. These probabilities were used as likeliness for classifying the individual to the corresponding cluster. The appropriateness of any analysis method must be evaluated according to the purpose of the analysis.