Journal of the Meteorological Society of Japan. Ser. II
Online ISSN : 2186-9057
Print ISSN : 0026-1165
ISSN-L : 0026-1165
Article
Tropical Cyclone Intensity Forecasting with Three Multiple Linear Regression Models and Random Forest Classification
Udai SHIMADA
Author information
JOURNAL OPEN ACCESS FULL-TEXT HTML
Supplementary material

2024 Volume 102 Issue 5 Pages 555-573

Details
Abstract

The Statistical Hurricane Intensity Prediction Scheme (SHIPS) is a multiple linear regression model for predicting tropical cyclone (TC) intensity. It has been widely used in operational centers because of forecast stability, high accuracy, easy interpretation, and low computational cost. The Japan Meteorological Agency (JMA) version of SHIPS is called the Typhoon Intensity Forecasting scheme based on SHIPS (TIFS) and predicts both maximum wind speed and central pressure. Although the addition of new predictors to SHIPS and TIFS has improved its accuracy, predicting TC intensity with a single regression model has limitations. In this study, a new TIFS-based forecasting scheme is developed using data from 2000 to 2021, in which three TIFS regression models corresponding to the intensifying, steady-state, and weakening stages of TCs are introduced and in which the weighted mean of the three TIFS forecasts based on random forest (RF) decision trees is computed as a final intensity forecast. Compared to the conventional TIFS model, the new scheme (TIFS-RF) has better accuracy with improvement rates of up to 12 % at forecast times from 1 to 4 days. The improvement is particularly significant for steady-state TCs, tropical depressions, and TCs undergoing extratropical transition within five days. The accuracy of TIFS-RF forecasts is generally better than that of conventional TIFS forecasts for rapidly intensifying TCs, but much worse for rapidly weakening TCs. This study also confirms that a consensus forecast of the TIFS-RF and Hurricane Weather Research and Forecasting models can overcome the weaknesses of each model used alone.

1. Introduction

Research on improving the accuracy of intensity forecasts is important for appropriate preparations for tropical cyclone (TC) disaster prevention. Since the early 2010s, great progress has been made in improving intensity forecasts (e.g., Zhang et al. 2023), although advances in this area have lagged behind those in track forecasts (e.g., DeMaria et al. 2014; Yamaguchi et al. 2017). Nevertheless, many challenges remain in TC intensity forecasting, including reducing 24-h forecast errors and predicting rapid intensification (RI) and difficult individual cases (e.g., Courtney et al. 2019; Wang et al. 2023). To make intensity forecasts, operational centers use dynamical models such as the Hurricane Weather Research and Forecasting (HWRF) model (e.g., Tallapragada et al. 2014, 2016; Biswas et al. 2018), statistical-dynamical guidance models such as the Statistical Hurricane Intensity Prediction Scheme (SHIPS) model (DeMaria and Kaplan 1994, 1999; DeMaria et al. 2005), and consensus models (e.g., Goerss and Sampson 2014; Knaff et al. 2020, 2023; DeMaria et al. 2021; Ko et al. 2023; Sampson et al. 2023). A consensus approach using multiple models, which is a recent trend to reduce intensity forecast errors (e.g., Wang et al. 2023), generally mitigates the biases of the individual models and thus can produce more accurate intensity forecasts compared with the use of single models.

SHIPS is a statistical-dynamical model with strong predictive skills that has long been used in operational centers in the U.S. Specifically, SHIPS is a multiple linear regression model that statistically predicts TC intensity changes from an initial time to a given forecast time based on forecasted environmental conditions along a TC track. This model has multiple advantages, including i) small variation between consecutive forecasts (i.e., forecast stability); ii) high accuracy, comparable to that of numerical models for TCs; iii) easy interpretation of forecasts, which specify the quantitative contributions of different predictors; and iv) low computational cost compared to that of a numerical model. These advantages make SHIPS very convenient for operational use. The Japan Meteorological Agency (JMA) has developed a similar model, called the Typhoon Intensity Forecasting scheme based on SHIPS (TIFS), for forecasting maximum wind speed (Vmax) and central pressure (Pmin) of TCs in the western North Pacific basin at 6-h intervals from the forecast time of 6 h (FT6) to FT120 (Yamaguchi et al. 2018). Operational use of TIFS was begun in 2017 (Ono et al. 2019; hereafter, the operational version is referred to as TIFS-OP). For their advantages, statistical–dynamical models will continue to be used operationally together with numerical models.

Because it relies on a statistical approach, however, TIFS is not good at predicting extreme intensity changes such as RI (Ono et al. 2019). Furthermore, TIFS tends to overforecast intensity (i.e., show a positive bias for Vmax and a negative bias for Pmin) for steady-state TCs, which here include tropical depressions (TDs) (Ono et al. 2019). For less frequent events, such as those showing RI or rapid weakening, other methods such as the RI index (Kaplan et al. 2010, 2015) should be used to compensate for the weakness of TIFS caused by the small sample size of such events. However, for most types of events (i.e., normal intensification, steady-state, and normal weakening cases), further improvement within the framework of linear regression methods may be possible. Improving the accuracy of majority event forecasts would significantly reduce the overall error of intensity forecasts. The motivation of this study, therefore, was to develop a new statistical–dynamical model with better accuracy while retaining the advantages of linear regression models.

Linear regression models have already been improved by adding new predictors (DeMaria et al. 2005; Jones et al. 2006; DeMaria 2010; Shimada et al. 2018, hereafter S18). These new predictors were selected based on our understanding of the relationship between TC structure and intensity changes (e.g., Xu and Wang 2010; Chen et al. 2011; Carrasco et al. 2014; Shimada et al. 2017). Jones et al. (2006) have demonstrated that the use of predictors derived from microwave satellite data improves the accuracy of SHIPS. Furthermore, S18 have shown that the use of TC structure-related predictors such as rainfall symmetricity and the Rossby number improves the forecast accuracy of TIFS, mainly for a forecast period of up to 3 days. This improvement is consistent with the fact that TC structure is correlated with intensity changes for at most 3 days ahead (Hakim 2013; Brown and Hakim 2013; S18). Improving TC intensity forecast accuracy up to 5 days ahead would require improvements in both the TC intensity forecasting method and track forecasts.

Machine learning and deep learning techniques, which have been increasingly used in recent years, may provide more accurate TC intensity forecasts. Some promising results have already been demonstrated (e.g., Cloud et al. 2019; Chan et al. 2021; Chen et al. 2023; Ko et al. 2023). However, it is generally hard to interpret forecasts based on machine learning or deep learning algorithms in a similar way to forecasts made with conventional guidance models, such as multiple linear regression models, which provide quantitative information on the contribution of each predictor to intensity changes.

This study developed a new TC intensity forecasting scheme that constructs three TIFS models for the intensifying, steady-state, and weakening stages of TCs while retaining the advantages of the conventional TIFS model, and applies a random forest (RF) algorithm (Breiman 2001), a well-known machine learning algorithm, to obtain weighting coefficients for averaging the forecasts from the three TIFS models. The RF classification algorithm constructs many decision trees, each of which uses randomly selected predictors, and then determines the answer by taking a majority vote of the decision trees. The RF classification is characterized by less overfitting and the weak learner (Breiman 2001) and is thus expected to contribute to the concept of the weighted mean of the three TIFS forecasts. This study aimed to reduce the errors of the conventional TIFS model by the new scheme (hereafter TIFS-RF).

The rest of this paper is composed of four sections. Section 2 describes the data used, TIFS-RF, and testing method. Section 3 presents the characteristics and skills of TIFS-RF and compares them against those of the conventional TIFS models. In Section 4, further improvement of intensity forecast accuracy through a consensus approach is discussed. Section 5 is a summary of this study.

2. Data and model description

2.1 Data used

As training data for the RF and regression models, the JMA best track data, Japanese 55-year Reanalysis (JRA-55) data (Kobayashi et al. 2015), successive geostationary satellite (Himawari) data, sea surface temperature (SST) data from the Centennial Observation-Based Estimates of SST (COBE-SST; Ishii et al. 2005), Ocean heat content (OHC) data from the Meteorological Research Institute multivariate ocean variational estimation (MOVE) system (Usui et al. (2006), and Global Satellite Mapping of Precipitation (GSMaP) data [the Japan Aerospace Exploration Agency (JAXA) 2023] for the period from 2000 to 2012 were used; these data are the same as the training data used for the TIFS model in S18 (hereafter TIFS-S18). Two datasets for forecast experiments were prepared (Table 1): dataset 1 is identical to the training data, and dataset 2, which includes the JMA real-time TC analysis data and the JMA Global Spectral Model (GSM; see Japan Meteorological Society 2023a) output data, is nearly identical to the real-time forecasting dataset used at the JMA. The forecast experiments were conducted for the period from 2013 to 2021, because during this period the real-time GSM output data were archived in their current form. Forecast experiments using dataset 1 were conducted to ascertain the basic skill of the new model, and experiments using dataset 2 were conducted to determine its real-time forecasting skill. Because until 2015, GSM outputs were only up to FT84 except for the initial time of 1200 UTC, the number of forecast samples used here greatly decreased after FT84. As for the JMA best track data, Vmax values are only available when TCs are the tropical storm strength or above, while Pmin values are available even when TCs are tropical depressions and extratropical cyclones. In the end, the sample size of Vmax and Pmin contained in the datasets differs.

In addition, operational HWRF forecasts for both Pmin and Vmax for the period from 2013 to 2021, obtained from the HWRF website [the Environmental Modeling Center (EMC) 2024], are used in this paper to evaluate the use of a consensus approach to improve forecast accuracy (see Section 4). The HWRF forecasts used include only events of tropical storm strength or above at the initial time. Because HWRF Vmax should be regarded as the 1-min sustained wind speed (e.g., Zhang et al. 2021), it was converted to 10-min sustained wind speeds by using Dvorak conversion tables for 1-min and 10-min Vmax values, as in Mei and Xie (2016). Namely, HWRF Vmax was converted to the “current intensity (CI)” number by using Dvorak’s (1984) table, and then the CI number was converted to the 10-min Vmax value by using the table in Koba et al. (1991).

2.2 Model description

Figure 1 shows a schematic flow chart of the model. TIFS-RF consists of two prediction processes. One process is multilinear regression. Three types of TIFS models are prepared, corresponding to intensifying (IN-TIFS), steady-state (SS-TIFS), and weakening (WK-TIFS) stages of TCs. The dependent variable (predictand) of TIFS is the intensity change from an initial time to a given forecast time. The other process is classification by the RF algorithm. In this study, the RF model consists of 100 decision trees and determines the weights to be used when averaging the forecasts from the three TIFS models. For example, if 75 decision trees predict intensification and 25 decision trees predict a steady state, then the weight of the IN-TIFS prediction is 75 % and the weight of the SS-TIFS prediction is 25 %. The contributions of the individual predictors to the overall intensity change are similarly obtained by using the RF prediction to weight the contributions of each predictor from the three TIFS models. Additionally, the variability of the three classifications made by using the 100 decision trees is used as uncertainty information on intensity changes. A preliminary examination confirmed that more than 100 trees did not significantly improve the accuracy of the RF model (see the Appendix).

Fig. 1

Schematic flow chart of the TIFS-RF model. Three TIFS models corresponding to the intensifying, steady-state, and weakening stages of TCs are introduced and a weighted consensus of the three TIFS forecasts is applied based on random forest (RF) decision trees.

Both the RF and TIFS models use the same 29 predictors as are used in TIFS-S18 (summarized in Table 2). Use of the same predictors eliminates the effect of a difference in the predictors used on accuracy. Meanwhile, TIFS-OP uses 26 predictors (see Table 1 of S18). An RF classification model consisting of 100 decision trees was constructed with five predictors in each decision tree, because the standard procedure is to use the square root of the total number of predictors (29 in this study) as the number of predictors in each tree (Breiman and Cutler 2005).

The criteria for IN, SS, and WK classes (Table 3) were determined such that the number of samples would be equal in the three classifications insofar as possible to ensure a sufficient sample size of training data for each of the three TIFS models. Because intensity changes are in units of 5 kt, however, the sample size was much larger for the SS class than for the IN and WK classes up to around FT12 (not shown). The distributions of Pmin and Vmax training samples, in which the y-axis corresponds to the dependent variable of the TIFS models, are shown in Fig. 2.

Fig. 2

Distributions of (a) Pmin and (b) Vmax training samples from 2000 to 2012 used for RF classification and the three TIFS models. The color scale shows the number of samples, and the areas surrounded by the red, white, and black lines indicate samples used for the IN, SS, and WK classes, respectively. The two black contours indicate the 5th and 95th percentiles of the training samples at each FT.

In this study, the model training process and forecast experiments (i.e., testing process) are the same as in S18. Specifically, the models for both the RF classification and the linear regression were constructed using training data from 2000 to 2012, the same computational period as that used to obtain the current operational TIFS coefficients. The forecast experiment was conducted using data from 2013 to 2021 in each of the two datasets.

The RF code used in this study provides the classification error rate using randomly extracted samples as validation data, not used in the actual training, from the RF training data (Fig. 3). Because the extracted samples contained the same TC cases as the actual training samples, they were not completely independent. Therefore, this error rate can be considered as only an estimate of the best possible classification accuracy of the RF model. Figure 3 shows that the classification error rate of the RF model is relatively poor up to FT30. This poor accuracy may be because the small variability of intensity changes within FT30 caused the classification to be difficult. The error rate of less than 15 % after FT30 indicates that good classification accuracy is possible later in the forecast period.

Fig. 3

Classification error rate of the RF classification model. The error rate was examined using randomly extracted samples from the RF training data. The classification error rate is defined as the rate of samples misclassified to the whole set of the samples.

Figure 4a shows the regression coefficients of the three TIFS models for Pmin prediction and those of TIFS-S18. In general, the signs of the coefficients in these four models are similar, including those of Pmin at FT = 0 h (MSLP), the square of OHC (OHC2), generalized vertical shear parameter (SHGC), square of 850–200-hPa vertical shear magnitude (SHDC) (SHSH), SHDC times the sine of latitude (SHLT), and the tendency of tangential wind averaged from 0 to 500 km from the TC center at 850 hPa (TWAT). As expected, however, some of the coefficients differ greatly among the three TIFS models. Because the coefficients of nonlinear predictors such as SHDC, the potential intensification (POT), and OHC are difficult to interpret, the coefficients of predictors in IN-TIFS and WK-TIFS that are easy to interpret are compared here. (i) The smaller the absolute value of MSLP minus 970 (OSLP), the more Pmin decreases in IN-TIFS and increases in WK-TIFS, because both the decrease and increase in Pmin are greater when Pmin is around 970 hPa. (ii) zonal storm motion (ZNAL) contributes to lower Pmin during periods of westward translation in IN-TIFS, whereas eastward translation contributes to lower Pmin in WK-TIFS, likely because WK-TIFS includes extratropical transitioning TCs that keep a low Pmin. (iii) The larger the 700–500-hPa relative humidity (RHMD, i.e., the wetter the environment), the more a Pmin decrease is suppressed in IN-TIFS, whereas a Pmin increase is suppressed in WK-TIFS. (iv) The larger the positive difference in equivalent potential temperature between a lifted surface parcel and its environment (EPOS, i.e., convective instability), the more a Pmin decrease is suppressed From FT72 to FT120 in IN-TIFS and the more a Pmin increase is suppressed in WK-TIFS. (v) In IN-TIFS, 850-hPa absolute vorticity (Z850) makes almost no contribution to Pmin, but in WK-TIFS, the larger it is, the more it contributes to a lower Pmin. (vi) In IN-TIFS, 200-hPa divergence (D200) also makes almost no contribution to Pmin, whereas in WK-TIFS, a larger D200 contributes to a Pmin increase. (vii) A larger axisymmetry of rainfall structure within a 300-km radius times OHC (AXIS) lowers Pmin up to ∼ FT48 and increases it after FT66 in IN-TIFS, whereas it increases Pmin after ∼ FT60 in WK-TIFS. These features of AXIS likely reflect the fact that weak TCs with a symmetric rainfall structure tend to intensify (Shimada et al. 2017), but intense WK TCs weaken within the forecast period.

Fig. 4

(a) Normalized Pmin predictor coefficients of the TIFS-S18, IN-TIFS, SS-TIFS, and WK-TIFS models from FT6 to FT120. All predictor coefficients from FT6 to FT120 are plotted in each predictor box (colors). A negative value indicates a decrease in Pmin from FT0 to FT120 when the corresponding predictor value is above average. (b) Constant Pmin values in the IN-TIFS, SS-TIFS, and WK-TIFS models from FT6 to FT120.

Another feature of the coefficients is that the constant value (the y-intercept) of the multiple linear regression equation differs among the three TIFS models (Fig. 4b). The constant value is much larger in IN-TIFS and WK-TIFS than in SS-TIFS, TIFS-OP, or TIFS-S18. Since the criteria of Pmin changes in the IN and WK samples change between FT36 and FT42 and between FT78 and FT84 (Fig. 2, Table 3), the constant values do not change smoothly between those FTs. As a result, forecasts also change irregularly between those FTs. However, as shown in the case studies in the next section, such irregular changes rarely remain when the weighted mean of the three TIFS model forecasts is used.

2.3 Testing method

The improvement in the forecast accuracy of TIFS-RF was evaluated by comparison against two existing models. Comparison with TIFS-OP was used to evaluate how much improvement can be expected when TIFS-RF is operationalized, and TIFS-RF was compared with TIFS-S18 to assess its forecast accuracy under the same predictors. Two kinds of independent data were used for the evaluation; datasets 1 and 2 (Table 1). The performance of TIFS-RF was evaluated with dataset 1 under the assumption that the input data were true. Although dataset 2 included errors in GSM outputs, it allowed the actual operational accuracy of TIFS-RF to be evaluated. Statistical significance was tested using the same methods as described in Section 2c of S18: i) paired Student’s t-test statistics with a two-sided test, ii) to consider correlations between forecast errors for the same TC, the effective sample size Ne was introduced:

  

where N is the actual sample size, and ρ1 is the lag-1 autoregression coefficient.

Testing was conducted in the same way as in Fig. 1c of S18. Namely, the Pmin testing includes forecasts of Pmin made from when a developing TC was a tropical depression to after the weakening TC became a tropical depression or an extratropical cyclone, as long as the JMA best track data for Pmin were available. The Vmax testing is similar to Pmin, but for Vmax only when a TC was the tropical storm strength or above due to the availability of the JMA best track data. As a result, the sample size for the overall testing was much larger for Pmin than Vmax. In this testing, forecasts made when and after a TC made landfall were not included.

3. Testing results

This section first presents the two testing results using dataset 1 and dataset 2, respectively, followed by case studies that examine some specific features of TIFS-RF forecasts. Hereafter, cases are referred to as IN, SS, or WK TCs based on the classification shown in Table 3. Pmin and Vmax changes below the 5th and above the 95th percentiles, respectively (Fig. 2), are called “rapid intensification (RI)”. Similarly, Pmin and Vmax changes above the 95th and below the 5th percentiles, respectively (Fig. 2), are called “rapid weakening.”

3.1 Statistical evaluation using dataset 1

First, the performances of the three guidance models (i.e., TIFS-RF, TIFS-OP, and TIFS-S18) that are not affected by errors in the input data (i.e., GSM outputs and real-time TC analyses) were evaluated by using the same data as the training data (i.e., dataset 1) but for different years (i.e., 2013–2021). Figure 5 shows mean absolute errors (MAEs) of the three models and the improvement rates of the MAEs of TIFS-RF against those of TIFS-OP and TIFS-S18. The MAEs of TIFS-RF decrease by a maximum of 1.7 hPa relative to those of TIFS-OP; after FT60, the MAEs of Pmin are less than 13 hPa, and those of Vmax level off to ∼ 10–12 kt (Fig. 5a). Pmin forecasts in TIFS-RF are significantly improved by more than 10 % from FT30 to FT96 against TIFS-OP and by more than 5 % from FT24 to FT90 against TIFS-S18 (Fig. 5b). The improvement rates of Vmax are statistically significant from FT12 to FT84 against TIFS-OP and from FT12 to FT78 except FT30 against TIFS-S18, although they are smaller than those of Pmin.

Fig. 5

(a) Mean absolute errors (MAEs) of TIFS-RF, TIFS-S18, and TIFS-OP forecasts from FT0 to FT120. (b) Improvement rates of TIFS-RF Pmin and Vmax forecasts against TIFS-OP (thick lines) and TIFS-S18 (thin lines) forecasts from FT0 to FT120. The filled circles indicate statistically significant differences at the 95 % level. The bar chart shows the number of samples (right axis, blue for Pmin, brown for Vmax). This testing was conducted for forecasts using the same kind of data as the training data but from 2013 to 2021.

Figure 6 shows the distributions of TIFS-RF MAEs and MAE differences between TIFS-RF and TIFS-S18 relative to best-track Pmin and Vmax changes from FT0 to each FT. Since TIFS-S18 uses the same predictors as TIFS-RF, MAEs of TIFS-S18 can be used as baseline values for evaluating TIFS-RF. For reference, MAEs of TIFS-RF are compared with those of TIFS-OP in Fig. S1. For Pmin, MAEs for SS and IN cases including RI are generally decreased. MAEs of Vmax for SS cases and slightly weakening cases are also decreased. In contrast, MAEs of both Pmin and Vmax are greatly increased for rapidly weakening cases, although this large increase does not increase the overall MAE because of the small sample size of rapidly weakening cases (e.g., Fig. 2).

Fig. 6

Two-dimensional distributions of the MAEs (color scale) of TIFS-RF forecasts for (a) Pmin and (b) Vmax, and differences in MAEs (color scale) between TIFS-RF and TIFS-S18 forecasts for (c) Pmin and (d) Vmax. The y-axis shows best-track intensity changes from FT0 to each FT (x axis). The two black contours indicate the 5th and 95th percentiles of the training samples at each FT shown in Fig. 2. This testing was conducted for forecasts using the same kind of data as the training data but from 2013 to 2021.

Figure 7 shows the distributions of TIFS-RF biases and bias differences between TIFS-RF and TIFS-S18 relative to best-track Pmin and Vmax changes from FT0 to each FT. Compared to TIFS-S18 (not shown), in TIFS-RF the overforecast bias of SS cases and slightly intensifying cases and the underforecast bias of RI cases are decreased for Pmin. Also, the over-forecast bias of SS cases for Vmax is decreased. In contrast, the overforecast bias for both Pmin and Vmax is greatly increased for rapidly weakening cases. This weakness of TIFS-RF indicates that, unlike the introduction of IN-TIFS, the introduction of WK-TIFS does not reduce the magnitude of the overforecast bias for rapid weakening cases. Because a single TIFS model originally predicted rapid weakening much better than RI (Fig. S1), it is interpreted that the introduction of the three TIFS models had the effect of making the prediction errors of RI and rapid weakening comparable (Figs. 6a, b).

Fig. 7

Two-dimensional distributions of the mean biases (color scale) of TIFS-RF forecasts for (a) Pmin and (b) Vmax, and differences in mean absolute biases (color scale) between TIFS-RF and TIFS-S18 forecasts for (c) Pmin and (d) Vmax. The y-axis shows best-track intensity changes from FT0 to each FT (x axis). The two black contours indicate the 5th and 95th percentiles of the training samples at each FT shown in Fig. 2. This testing was conducted for forecasts using the same kind of data as the training data but from 2013 to 2021.

One important feature of TIFS-RF to consider its operational use is that the annual mean improvement rates of MAEs of TIFS-RF Pmin and Vmax forecasts relative to TIFS-OP forecasts vary greatly from year to year (Fig. 8). The reason why the overall improvement rate was not statistically significant after FT96 (Fig. 5b) is because there are years with large positive improvement rates but also years with large negative improvement rates.

Fig. 8

Improvement rates of MAEs of (a) Pmin and (b) Vmax in TIFS-RF forecasts against TIFS-OP forecasts by year from FT0 to FT120. This testing was conducted for forecasts using the same kind of data as the training data but from 2013 to 2021.

TIFS-RF forecasts are also characterized by large improvements for TD cases and extratropical transitioning TCs (Fig. 9). Here, TD cases only include TDs at FT0 that were upgraded to tropical storms within 120 h from FT0. Extratropical transitioning TCs include those that completed extratropical transition (ET) within 120 h from FT0. In general, MAEs of TD cases are greater than those of all cases (Fig. 5). In TIFS-RF, Pmin and Vmax show significant improvements against TIFS-S18 from FT12 to FT90 and from FT6 to FT84 except FT18–30, respectively (Fig. 9b). In particular, the Pmin prediction shows an improvement of more than 15 % from FT18 to FT96 against TIFS-OP. The better accuracy for TD cases is due to the introduction of the SS model, which corrects the tendency in TIFS-OP for any weak TDs to intensify (causing overforecasts) and reduces the bias in SS TCs and slightly intensifying TCs with Pmin changes greater than −40 hPa (Fig. S2). Forecasts of TIFS models for extratropical transitioning TCs are characterized by MAEs smaller than those of all cases (Figs. 5, 9c). TIFS-RF further decreases the MAEs of those TCs, especially MAEs of Pmin after FT66 (Fig. 9c). The high improvement rate of Pmin forecasts of TIFS-RF relative to those of TIFS-OP and TIFS-S18 for extratropical transitioning TCs (Fig. 9d) is due to a substantial decrease in MAEs for steady-state, extratropical transitioning TCs (Fig. S3). TIFS-RF was able to suppress a negative bias of Pmin forecasts for those TCs compared to TIFS-S18 (Fig. S4).

Fig. 9

(a) MAEs in TIFS-RF, TIFS-S18, and TIFS-OP forecasts for tropical depression (TD) cases from FT0 to FT120. (b) Improvement rates of Pmin and Vmax in TIFS-RF forecasts for TD cases against those in TIFS-OP (thick line) and TIFS-S18 (thin line) forecasts from FT0 to FT120. The filled circles indicate statistically significant differences at the 95 % level. The bars show the number of samples (right axis, blue for Pmin, brown for Vmax). (c, d) As in (a, b), but for extratropical transitioning (ET) TCs. This testing was conducted for forecasts using the same kind of data as the training data but from 2013 to 2021.

3.2 Statistical evaluation using dataset 2

Next, the accuracy of the three guidance models is evaluated using dataset 2 (e.g., GSM outputs; see Table 1). Figure 10 shows MAEs of the three models, the improvement rates of the MAEs of TIFS-RF against those of TIFS-OP and TIFS-S18, and the distributions of TIFS-RF MAEs relative to best-track Pmin and Vmax changes from FT0 to FT120. The use of GSM forecasts as input data results in much larger MAEs than those when dataset 1 was used as input data (Fig. 5). Comparison of the MAE distributions with those obtained with dataset 1 (Fig. 6) shows that SS cases and IN cases with Pmin changes greater than −40 hPa have larger errors (Fig. 10c) associated with overforecasting (not shown). This result can be interpreted as the effect of using GSM outputs. That being said, when TIFS-RF uses GSM outputs, the MAE of Pmin decreases by as much as ∼ 2 hPa to ∼ 15 hPa and that of Vmax decreases by ∼ 1 kt to ∼ 13 kt compared to TIFS-OP (Fig. 10a). Improvement rates are almost the same as those shown in Fig. 5b against TIFS-OP and TIFS-S18. Therefore, it can be inferred that the superiority of TIFS-RF for operational use is almost the same when GSM outputs are used as when “true” data are used.

Fig. 10

(a, b) As in Figs. 5a and 5b, but for forecasts using the real-time dataset. (c, d) As in Figs. 6a and 6b, but for forecasts using the real-time dataset.

Figure 11 shows improvement rates of TIFS-RF relative to TIFS-OP by year. As with the results obtained with dataset 1 (Fig. 8), the improvement rate varies greatly from year to year. Although the GSM specifications have changed over the years to improve the accuracy of TC track and environmental field forecasts (e.g., Yonehara 2021), the year-to-year differences in the TIFS-RF improvement rate appear to be unrelated to GSM upgrades.

Fig. 11

As in Fig. 8, but for forecasts using the real-time dataset from 2013 to 2021.

3.3 Case studies

In this subsection, three characteristics that were statistically found in Section 3.1 are further demonstrated through case studies: (1) the improvement in intensity forecasts for SS and IN cases; (2) the deterioration in rapid weakening predictions; and (3) the improved prediction of extratropical transitioning TCs.

The first example is an SS case, Typhoon Namtheun (2021) (Fig. 12) initialized at 1200 UTC 12 October 2021. In this case, the SS-TIFS model contributes greatly to the improvement of Pmin prediction. For Typhoon Namtheun, the GSM predicted a gradual weakening, whereas TIFS-OP predicted intensification in the first half of the prediction period and TIFS-S18 predicted slight intensification. In fact, however, Namtheun’s Pmin did not fall greatly and remained steady up to FT96. TIFS-RF was able to predict these steady-state Pmin changes of Typhoon Namtheun. More specifically, IN-TIFS predicted a Pmin decrease of −15 hPa at FT84, whereas SS-TIFS and WK-TIFS both predicted that Pmin would remain almost the same (Fig. 12b). More than 60 % of the RF decision trees predicted SS at most FTs. As a result, the weighted mean of the three TIFS forecasts was similar to the SS-TIFS forecast and consistent with the best-track Pmin. Thus, because of the incorporation of the RF prediction and SS-TIFS, TIFS-RF did not overforecast, unlike TIFS-S18, which used the same predictors as TIFS-RF. The performance of TIFS-RF for Vmax prediction was similar (Figs. 12c, d). Additionally, the variability of the decision trees in this case suggested a relatively large uncertainty in the middle of the forecast period. This uncertainty information may be helpful in considering the need for the second scenario of intensity prediction.

Fig. 12

(a) Pmin forecasts of TIFS-RF, TIFS-S18, TIFS-OP, and GSM for steady-state Typhoon Namtheun (2021) initialized at 1200 UTC 12 October 2021. (b) RF classification (bar charts) and Pmin changes of IN-TIFS, SS-TIFS, WK-TIFS, and TIFS-RF forecasts from FT0 to FT120 (curves) for Typhoon Namtheun. The bar chart shows the number of RF decision trees in the IN, SS, and WK classes (right axis). (c, d) As in (a, b), but for Vmax forecasts.

Figure 13 shows the predictions of rapidly intensifying Typhoon Songda (2016) initialized at 0000 UTC 8 October 2016. The magnitude of intensification predicted by TIFS-RF, which is mostly from IN-TIFS (Figs. 13b, d), was larger than that predicted by TIFS-OP and TIFS-S18 (Figs. 13a, c); as a result, the RI prediction was better. Additionally, whereas TIFS-OP and TIFS-S18 failed to predict the rapid weakening just after the large intensification, TIFS-RF was able to predict this weakening because the proportion of RF decision trees that predicted SS increased at FT120.

Fig. 13

As in Fig. 12, but for rapidly intensifying Typhoon Songda (2016) initialized at 0000 UTC 8 October 2016.

Second, Fig. 14 shows the predictions of rapidly weakening Typhoon Halong (2019) initialized at 1200 UTC 5 November 2019. More than 90 of the decision trees predicted WK from FT48 for Pmin and FT60 for Vmax, indicating low uncertainty in the classification prediction. However, WK-TIFS, and thus TIFS-RF, did not predict weakening as large as that predicted by TIFS-OP and TIFS-S18 (Figs. 14b, d). This feature led to an overall worsening of rapid weakening predictions.

Fig. 14

As in Fig. 12, but for rapidly weakening Typhoon Halong (2019) initialized at 1200 UTC 5 November 2019.

The third feature is illustrated by the predictions of extratropical transitioning Typhoon Namtheun (2021) initialized at 0000 UTC 14 October 2021 (Fig. 15). Typhoon Namtheun became an extratropical cyclone at FT72 and then slightly redeveloped after FT96 (Fig. 15a). Both TIFS-OP and TIFS-S18 had large MAEs of Pmin forecasts due to overforecasting. In contrast, TIFS-RF slightly overforecasted Pmin because at least 70 % of the RF decision trees predicted SS after FT42 (Fig. 15b); as a result, the MAEs were much smaller than those of TIFS-OP and TIFS-S18. GSM forecasts would be used operationally after ET, as the GSM predicted Typhoon Namtheun’s Pmin very well. However, the intensity guidance model should also have good accuracy both before and after ET because of the uncertainty in the timing of ET.

Fig. 15

As in Fig. 12, but for Pmin forecasts of extratropical transitioning Typhoon Namtheun (2021) initialized at 0000 UTC 14 October 2021. Typhoon Namtheun (2021) became an extratropical cyclone, shown by ET in (a), at FT72.

In these case studies, forecasts from each TIFS model irregularly fluctuated before and after a certain FT, when the classification criteria for the multiple linear regression models changed (e.g., Fig. 2, Table 3). For example, an irregular change is seen in IN-TIFS forecasts between FT78 and FT84 (Fig. 12b), in SS-TIFS forecasts between FT78 and FT84 (Fig. 13b), and in WK-TIFS forecasts between FT42 and FT48 (Fig. 13d). In each case, these fluctuations are seen in the minority RF class. As a result, these large fluctuations generally disappear in the weighted mean of the three TIFS forecasts.

4. Discussion

While TIFS-RF slightly decreased the MAEs of RI prediction compared to TIFS-S18, it increased the MAEs of rapid weakening prediction (Fig. 6). It would be difficult for any single guidance model to perfectly predict all kinds of TC intensity changes. A consensus approach of multiple predictions that compensate for individual weaknesses may lead to further improvement. Therefore, a consensus approach (i.e., arithmetic mean) with TIFS-RF and HWRF is investigated as a possible effective use of TIFS-RF. Here, TIFS-RF forecasts using GSM outputs are used.

Figures 16a and 16b show the MAEs of the consensus forecast for Pmin and Vmax as well as those of the TIFS-RF and HWRF forecasts. In general, the TIFS-RF forecast is more accurate than the operational HWRF forecast for both Pmin and Vmax. Furthermore, the consensus forecast of HWRF and TIFS-RF for both Pmin and Vmax is generally more accurate than either TIFS-RF or HWRF alone. Note that the accuracy of operational HWRF has improved over the years (e.g., Zhang et al. 2023). For example, after 2017, operational HWRF Pmin forecasts are generally more accurate than TIFS-RF after FT30 (not shown). Even so, the consensus forecast generally maintains its superiority after 2017 (Fig. S5).

Fig. 16

(a) MAEs of HWRF, TIFS-RF, and consensus forecasts for Pmin (curves) and the number of samples (right axis) from FT0 to FT120. (b) As in (a), but for Vmax forecasts. (c, d) As in Figs. 6c and 6d, but for differences in MAEs between the consensus and HWRF forecasts.

Figures 16c and 16d show distributions of MAE differences between the consensus and HWRF forecasts for Pmin and Vmax, respectively. The consensus reduces the MAEs of the HWRF forecasts in most cases, except for the rapidly weakening cases and some of the RI cases. Although the MAEs of the consensus forecasts of the rapidly weakening cases increase compared to those of the HWRF forecasts, the overall accuracy of the consensus is nevertheless improved because in an overwhelmingly large number of consensus forecast samples, the errors are reduced (not shown). The development of a model specialized for RI and rapid weakening predictions remains challenging and will be addressed in future work.

5. Summary

In operational centers, a single multiple linear regression model has been used to predict TC intensity as a guidance model. Its advantages include forecast stability, high accuracy, easy interpretation, and low computational cost. The Typhoon Intensity Forecasting scheme based on SHIPS (TIFS) is a TC intensity prediction model of the JMA that uses a multiple linear regression technique. Although the addition of new predictors has improved its accuracy, the use of a single multiple linear regression model to predict TC intensity has limitations. In this study, a new TIFS-based forecasting scheme (TIFS-RF) was developed, in which three TIFS models corresponding to the intensifying, steady-state, and weakening stages of TCs are introduced and in which the weighted mean of the three model forecasts based on RF decision trees is computed as a final intensity forecast.

Compared to conventional single TIFS models, TIFS-RF provided much better forecasts with an improvement rate of up to 12 % at forecast times from 1 to 4 days. In particular, the improvement was significant for SS TCs, TD cases, and TCs undergoing extratropical transition within five days. The conventional single TIFS models tended to overforecast in SS cases. The TIFS-RF scheme could reduce the over-forecast bias in SS cases by using the weighted mean of the three TIFS model forecasts based on RF decision trees. TIFS-RF slightly improved RI prediction, whereas its prediction accuracy for rapidly weakening cases was lower than that of single TIFS models. Even though nearly all decision trees of the RF model predicted intensification or weakening, the intensifying or weakening TIFS model failed to predict the actual large intensity changes. This result suggests a limitation of the linear regression model. However, the overall accuracy of TIFS-RF was much better than that of single TIFS models because the number of SS samples is much greater than the number of rapid weakening samples. One solution to compensate for the weaknesses of the TIFS-RF model is to use the consensus of the TIFS-RF and HWRF model forecasts. Here, the overall accuracy of the consensus was found to be better than that of either TIFS-RF or HWRF alone.

Data Availability Statement

The JMA best track data are available on their website (https://www.jma.go.jp/jma/jma-eng/jma-center/rsmc-hp-pub-eg/trackarchives.html). The JRA-55 reanalysis and COBE-SST data are available at https://jra.kishou.go.jp/JRA-55/index_en.html. GSMaP data are provided on the website at https://sharaku.eorc.jaxa.jp/GSMaP/registration.html. GSM output data are available at https://database.rish.kyoto-u.ac.jp/arch/jmadata/data/gpv/original/. HWRF data are available at https://www.emc.ncep.noaa.gov/gc_wmb/vxt/HWRF_legacy/index.php. The JMA successive geostationary satellite (Himawari) data are available at http://www.cr.chiba-u.jp/japanese/database.html. MGDSST data are available at https://www.data.jma.go.jp/gmd/goos/data/database.html. Other data and the complete datasets used for TIFS-RF training and forecasts including OHC data and the JMA’s real-time TC analysis data are available upon request to the corresponding author in the framework of a collaboration with the Meteorological Research Institute of the JMA. The code of random forests used in this study is distributed at https://www.stat.berkeley.edu/~breiman/RandomForests/cc_software.htm.

Supplements

Figure S1 shows two-dimensional distributions of the MAEs of TIFS-OP forecasts for (a) Pmin and (b) Vmax, and differences in MAEs between TIFS-RF and TIFS-OP forecasts for (c) Pmin and (d) Vmax. Figure S2 shows two-dimensional distributions of (a) MAEs and (b) mean biases of TIFS-RF forecasts for Pmin of TD cases and differences in (c) MAEs and (d) mean absolute biases between TIFS-RF and TIFS-S18 forecasts for Pmin of TD cases. Figure S3 shows two-dimensional distributions of MAEs of TIFS-RF for (a) Pmin and (b) Vmax forecasts for ET TCs and differences in MAEs between TIFS-RF and TIFS-S18 forecasts for (c) Pmin and (d) Vmax of ET TCs. Figure S4 shows two dimensional distributions of mean biases of TIFS-RF forecasts for (a) Pmin and (b) Vmax for ET TCs, mean biases of TIFS-S18 forecasts for (c) Pmin and (d) Vmax for ET TCs, and differences in mean absolute biases between TIFS-RF and TIFS-S18 forecasts for (e) Pmin and (f) Vmax for ET TCs. Figure S5 shows differences in MAEs between the consensus and HWRF forecasts for (a) Pmin and (b) Vmax for each year from 2013 to 2021.

Acknowledgments

We are deeply grateful to our colleagues at the JMA, Mr. A. Shimokobe and Dr. M. Yamaguchi, for giving a lot of fruitful comments, which greatly led to the improvement of TIFS-RF. Gratitude is also extended to Dr. K. Musgrave and Dr. M. DeMaria, who were the collaborators to develop TIFS-OP and TIFS-S18. The valuable suggestions from two anonymous reviewers are appreciated. This work was supported by MEXT KAKENHI Grant 21H01164 and 23K13172. The views in this paper are those of the author and should not be regarded as official views of the JMA.

Appendix: Number of RF decision trees

To determine the number of RF decision trees, a preliminary study was conducted in the same way as in Fig. 3, but with different numbers of decision trees. The classification error rates of the RF classification model with different numbers of decision trees show that more than 100 trees do not significantly improve the accuracy of the RF model (Fig. A1).

Fig. A1

As in Fig. 3, but for the classification error rates of the RF classification model with different numbers (colors) of decision trees for (a) Pmin and (b) Vmax.

References
 
© 2024 The Author(s) CC-BY 4.0 (Before 2018: Copyright © Meteorological Society of Japan)

©The Author(s) 2024. This is an open access article published by the Meteorological Society of Japan under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
https://creativecommons.org/licenses/by/4.0
feedback
Top