An Algorithm for Land Surface Temperature Retrieval Using Three Thermal Infrared Bands of Himawari-8

This paper presents a method for estimating the land surface temperature (LST) from Himawari-8 data. The Advanced Himawari Imager onboard Himawari-8 has three thermal infrared bands in the spectral range of 10 – 12.5 μm. We developed a nonlinear three-band algorithm (NTB) that makes the best use of these bands to estimate the LST. The formula of the algorithm includes 10 coefficients. The optimum values of these coefficients were derived using a statistical regression method from the simulated data, as obtained by a radiative transfer model. The simulated data sets correspond to a variety of values of LST, as well as surface emissivity, type and season of temperature and water vapor profiles. Viewing zenith angles (VZAs) from 0° to 60° were considered. For the coefficients obtained in this way, we verified the root-mean-square error (RMSE) in terms of the VZA, LST and precipitable water dependence. We showed that the NTB can accurately estimate the LST with an RMSE less than 0.9 K compared with the nonlinear split-window algorithm developed by Sobrino and Romaguera (2004). Moreover, we evaluated the sensitivities of the LST algorithms to the uncertainties in input data by using the dataset independent of the dataset used to obtain coefficients. Consequently, we showed that the NTB has the highest robustness against the uncertainties in input data. Finally, the stepwise LST retrieval method was constructed. This method includes a simple cloud mask procedure and the land surface emissivity estimation. The LST product was evaluated using in-situ data over the Tibetan Plateau, and the validity was confirmed.


Introduction
The land surface temperature (LST) is a key parameter of land-atmosphere interaction on various scales.Changes in surface cover due to deforestation or urbanization can cause changes in the LST and heat flux, thus leading to environmental issues, such as desertification and urban heat islands.For this reason, it is important to understand LST changes.Methods for observing the LST can be roughly divided into in-situ observations and satellite/aircraft observations.Given that satellite/aircraft observations can provide LST data over a wide area in homogeneous quality, this approach is generally used to study the distribution of vegetation and urban climates (Sobrino and Raissouni 2000;Oku et al. 2007;Weng 2009).Various algorithms have been proposed to estimate LST from various sensors, such as the Moderate Resolution Imaging Spectroradiometer (MODIS) (Wan and Dozier 1996;Wan 2014), the Advanced Very-High-Resolution Radiometer (AVHRR) (Sobrino and Raissouni 2000;Sobrino et al. 1994;Coll and Caselles 1997;François and Ottlé 1996), the Landsat-8 Thermal Infrared Sensor (TIRS) (Rozenstein et al. 2014), the Spinning Enhanced Visible and Infrared Imager (SEVIRI) (Sun and Pinker 2007;Atitar and Sobrino 2009), the Geostationary Operational Environmental Satellite (GOES) imager (Sun and Pinker 2003;Xu et al. 2014), the FengYun meteorological satellite instruments (Tang et al. 2008(Tang et al. , 2015)), and the Japanese Advanced Meteorological Imager (JAMI) (Takeuchi et al. 2012).
Himawari-8, the successor to Multifunction Transport Satellite-2 (MTSAT-2), was launched in October 2014 and began observations in July 2015.This satellite is a geostationary satellite located above the equator at longitude 140.7°E and is flying at an altitude of 35786 km.The Advanced Himawari Imager (AHI) onboard Himawari-8 contains 3 visible bands (centered at 0.47, 0.51 and 0.64 μm) and 13 infrared bands (centered at 0.86, 1.61, 2.25, 3.9, 6.2, 6.9, 7.3, 8.6, 9.6, 10.4, 11.2, 12.4 and 13.3 μm).The AHI has two more visible bands and nine more infrared bands than MTSAT-2.The spatial resolution of the visible band (0.64 μm) is 0.5 km, and that of the other visible bands (0.47 and 0.51 μm) and one near-infrared band (0.86 μm) is 1.0 km.The spatial resolution of the infrared bands is 2.0 km.The observation cycle of the AHI is 2.5 minutes in the area of Japan and 10 minutes for the full disk.The spatial and temporal resolutions of AHI are both improved compared to the MTSAT-2.Currently, the AHI is the most advanced sensor on-board geostationary satellites.It is expected that the construction of the estimation method of LST using Himawari-8 enables us to observe small-scale features in high-frequency compared to previous satellite data.
In this paper, we present a new LST retrieval formula that makes maximum use of information from the new bands of the AHI.Section 2 describes a basic theory for the estimation method of LST and several candidate formulas tested in this study.Section 3 describes the simulation setting, in which we use a radiative transfer model to verify the LST retrieval formulas; the verification method is also discussed in detail.Section 4 describes the results and discussions for the LST algorithm errors and the operational retrieval method of LST.

Basic theory
Assuming a cloud-free atmosphere under local thermodynamic equilibrium, the radiance I i measured in the thermal infrared (TIR) band i at the viewing zenith angle (VZA) θ at the top of the atmosphere can be represented by the following simplified radiative transfer equation (Li et al. 2013a): where B i is Plank's function weighted by the spectral response function of band i and T i is the brightness temperature of band i at the top of the atmosphere.ε iθ and τ iθ are the land surface emissivity (LSE) and the atmospheric transmittance, respectively, at VZA θ.B i (T sfc ) is the radiance from the black body of surface temperature T sfc , thus, ε iθ B i (T sfc ) τ iθ is the surface emis- sion attenuated by the atmosphere.R atmiθ ¯ is the upward atmospheric thermal radiance at VZA θ and (1 -ε i (θ, ϕ))R ati¯τ iθ is the downward atmospheric thermal ra- diance reflected by the land surface and attenuated by the atmosphere.Thus, once we know the effects of the atmosphere (such as the transmittance and the thermal radiance) and the LSE, the LST is estimated from the observed radiance of band i through (1).
However, it is difficult to estimate the effect of the atmosphere by using the radiative transfer equation for only a single band.To overcome this problem, many studies have used a linear split-window algorithm (Wan and Dozier 1996;Sobrino et al. 1996;Oku and Ishikawa 2004), which considers the effect of the atmosphere by utilizing the differential absorption in the spectral range 10 -12.5 μm.In this range, the variation in the LSE is generally small, and water vapor is the main atmospheric component with a strong variable effect on thermal radiance.The linear split-window algorithms use the difference between two brightness temperatures measured for wavelengths in the range 10 -12.5 μm.A typical example is shown in the following: where the coefficients a k (k = 0, 1, 2) depend on the spectral response functions g i , g j , the LSEs ε i and ε j of the TIR bands i and j, the total column water vapor WV in the atmosphere, and the VZA θ.Thus, the a k can be expressed as follows: However, the use of the linear split-window algorithm, sometimes results in a large error when the atmospheric conditions are wet and/or hot (Li et al. 2013a).To maintain the accuracy, nonlinear split-window algorithms (NSWs) have been developed in which a quadratic term of the difference between two brightness temperatures is added: where the coefficients a k (k = 0, 1, 2) also depend on the g i , g j , LSEs and WV.The NSW developed by Sobrino and Romaguera (2004) has been used to estimate the LST from geostationary satellite data (Atitar and Sobrino 2009;Takeuchi et al. 2012), such as that obtained by the Meteosat Second Generation (MSG) satellite and the Multifunction Transport Satellite (MTSAT).This NSW is expressed as follows: where ε is the "mean" emissivity of the TIR bands i and j, ε = (ε i + ε j )/2, ∆ε is the difference in emissivity in the TIR bands i and j, ∆ε = ε i -ε j and PW is the total precipitable water (PW) (g cm -2 ).By contrast, Sun and Pinker (2003) developed a three-band algorithm (TB) that considers the effect of the atmosphere by using a combination of any three thermal infrared bands.Their TB has the following form: Sun and Pinker (2003) proposed the use of a 3.9 μm mid-infrared (MIR) band as T k , and they demonstrated that the TB is more accurate than the generalized (linear) split-window algorithm (Wan and Dozier 1996).However, the TB with one MIR band and two TIR bands is only valid at night because the radiance measured in the MIR band includes the solar radiation reflected by the land in daytime.
The AHI has three TIR bands (bands 13, 14 and 15) in the range of 10 -12.5 μm (Fig. 1).Therefore, either the NSW or the TB can be used to retrieve the LST.
To achieve further improvement of the accuracy, we here propose a new nonlinear three-band algorithm (NTB), which has the following form: The three bands (i, j, k) are taken from three AHI TIR bands in the 10 -12.5 μm range, thus making the maximum use of AHI window TIR bands.The use of MIR is out of scope because it cannot be used in daytime.AHI has other TIR bands, such as band 11 (centered at 8.6 μm) and band 12 (centered at 9.6 μm).However, the uncertainty for the estimation accuracy of the LSE in the range of band 11 is considerably higher than that of the 10 -12.5 μm range in barren or sparsely vegetated areas (Sobrino et al. 2008;Peres and DaCamara 2005).The range of band 12 has the O 3 absorption.These uncertainties and properties affect the final LST retrieval accuracy.Hence, bands 11 and 12 were not used in this study.

Simulation setting
The coefficients of the NSW, TB, and NTB must be determined empirically by using a set of LST-(T i , T j , T k ) relation generated by the radiation transfer model under various atmospheric and land surface conditions and for different VZAs.In this study, the Rstar6b radiative transfer model (Nakajima andTanaka 1986, 1988) is employed to compute the radiance at the top of the atmosphere for the given values of the surface and atmospheric parameters.The ancestor program, Rster5, contains LOWTRAN5 as the gaseous absorption model.Rstar6b contains the nonlinear fitting k-distribution method (Nakajima et al. 2000;Sekiguchi and Nakajima 2008) as the gaseous absorption model.
For the variety of atmospheric conditions between the land surface and satellite borne sensor, we used two types of cloudless atmosphere radiosonde database: the Thermodynamic Initial Guess Retrieval (TIGR) database and the six model atmospheres (tropical, mid-latitude summer, mid-latitude winter, highlatitude summer, high-latitude winter, and US standard).TIGR is used as the calibration data to determine coefficients, and the six model atmospheres are used as validation data.The TIGR database contains 2311 atmospheric profiles and associated surface data (pressure, temperature, water vapor and ozone profiles) selected by statistical methods from 80000 radio sonde reports (http://ara.lmd.polytechnique.fr/).To extract the data in clear sky condition from the TIGR database, the cloud determination test (Galve et al. 2007) is first applied using the relative humidity (RH) in each sounding as criteria.When one layer has an RH larger than 90 % or when continuous two layers have an RH larger than 85 %, the radiosonde data is excluded as cloudy data.The data is excluded as foggy data when the layers within 2 km from the surface have an RH larger than 80 %.Finally, 215 radiosonde data in the Himawari-8 main observation area was selected from the extracted data, as shown in Fig. 2a. Figure 2b shows the air temperature (T 0 ) of the lowest layer and the total PW distribution of selected radiosonde data.The T 0 ranges from 244 K to 312 K and the PW varies from 0.06 g cm -2 to 7.60 g cm -2 .For each of the 215 radiosonde data, the LST-(T i , T j , T k ) relation is computed for various values of LST, LSE and VZA.For the LST, six values from T 0 -5 K to T 0 + 20 K with 5 K intervals are used.We assumed that the surface was Lambertian.For the LSE, the emissivity spectra of 87 different natural or manmade materials are considered: 40 spectra for soils; 23 spectra for rocks; 6 spectra for manmade materials; 12 spectra for vegetation; and 6 spectra for water, snow and ice.These materials were selected from the MODIS University of California, Santa Barbara (UCSB) Emissivity Library (http://www.icess.ucsb.edu/modis/EMIS/html/em.html; Wan et al. 1994) and the ASTER spectral library (http://speclib.jpl.nasa.gov/; Baldridge et al. 2009).Soil materials include inceptisols, entisols, mollisols and aridisols.Rock materials include igneous rocks and sedimentary rocks.Manmade materials include concretes, road asphalts and tar.In terms of the emissivity of vegetation, we  used the TIR volumetric bidirectional reflectance distribution function (BRDF) model (Snyder and Wan 1998) to calculate the BRDF and integrated it over the hemisphere; this process was conducted to obtain the directional hemispherical reflectance and emissivity from the emissivity of each leaf sample measured by laboratories.The TIR volumetric BRDF model form is expressed as follows: The kernels are given by the coefficients are given by where θ i and θ r are the incident zenith angle and re- flected zenith angle, respectively; ρ and ρ 0 are the leaf reflectance and ground reflectance; τ is the leaf trans- mission; bF is the optical depth; ξ is the scattering angle between incidence and reflection. is given by: arccos cos cos sin sin cos , where φ is the relative azimuth angle between the incident and reflected directions.τ is taken to be zero for the TIR region and bF is taken to be infinite (Snyder and Wan 1998).Therefore, this model requires not a ground reflectance ρ 0 but a leaf reflectance ρ to deter- mine the volumetric BRDFs f vol .For the green state, we considered three types of broad leaves, two types of needle leaves and a green grass; for the senescent state, we considered two types of broad leaves, two types of needle leaves, a bark and an old grass.Consequently, the emissivity of each TIR band was obtained by approximating the integral of the spectral response function over the emissivity spectrum.For each of the AHI TIR bands, the filtered emissivity values of 87 surface types are shown in Fig. 3.We considered the LSE value at nadir for all the observation angles, but it should be considered the angular dependence on the emissivity of each surface type (Snyder et al. 1998;Sobrino et al. 2005).However, this approximation can express the general properties or relationships of three TIR bands (Atitar and Sobrino 2009;Sobrino and Romaguera 2004).By using the above considerations, we prepared the input data to Rstar6b.These data included the spectral response functions for the 3 AHI TIR bands, the 7 VZA values (0, 10, 20, 30, 40, 50 and 60°), 6 model atmospheres, 215 radiosonde profiles, 6 LST values for each profile, and 87 band emissivity values.We calculated the emissivities of five ice and/or snow conditions, but these were used only when the LST was estimated to be less than 273 K.These data are entered into Rbstar6b and 109467 LST-(T i , T j , T k ) relations are obtained for a fixed VZA value.These 215 TIGR profiles are used as calibration data to obtain the coefficients of each LST algorithm.By contrast, the 2987 data with 6 model atmospheres are used for validation.For each input, Rstar6b computes the radiances of three different bands.The radiances were converted into the brightness temperatures by using the conversion coefficients of the AHI sensor.

Calculation of coefficients and the estimation
error for each algorithm The coefficients of the NSW, TB, and NTB were obtained by three regressions of ( 5), ( 6) and ( 7) by using the Levenberg-Marquardt method.For the NSW, different choices of two bands from the available three window thermal bands were tried.Tables 1,  2, 3, 4 and 5 respectively show the coefficients of the NSW using bands 13 and 14 (NSW_10.4-11.2),NSW using bands 13 and 15 (NSW_10.4-12.4),NSW using bands 14 and 15 (NSW_11.2-12.4),TB and NTB for different VZAs.In previous studies such as Atitar and Sobrino (2009) and Jiang et al. (2015), the coefficients of a nonlinear split-window algorithm were adapted to the functions depending on the cosine of VZA.These functions can be derived from a coefficient data table, such as Table 1, 2 or 3, by polynomial curve fitting.It is very useful for the practical use of the LST observation.The NTB coefficients can similarly be expressed as the functions depending on the cosine of VZA, as shown in Fig. 4.However, for the coefficients a1, a3 and a5, the fitting error larger than the order of 10 -3 may cause an LST error larger than 1.0 K because these coefficients are multiplied by the brightness temperatures directly.Therefore, if we estimate the LST by using fitting results, we should use them carefully.
The precision of the fitting was estimated for all algorithms by comparing the input LST and estimated LST.The biases were smaller than the order of 10 -1 in the range of VZAs considered for all algorithms.The root-mean-square errors (RMSEs) of five algorithms for seven VZAs by using the calibration data are shown in Fig. 5.The NSW_10.4-12.4 has the best accuracy in the VZA ranging from 0° to 30° (green dashed line with dot).However, for the VZA greater than 40°, the NTB has the lowest RMSE (red solid line with dot).The RMSE is lower than 1.3 K even at the VZA 60°.The NTB estimation error changes by 0.55 K as the VZA changes from 0° to 60°, whereas it changes by 0.81 K for the NSW_10.4-12.4.Therefore, considering the angular variation of the estimation accuracy, the NTB gives a sound fitting over a wider range of VZA than the NSW_10.4-12.4.Furthermore, we compared the estimation error of the NSW_10.4-12.4 and the NTB for different LST classes and PW classes in Tables 6 and 7.Each table shows the  RMSEs of those two algorithms for different LST and PW classes by using calibration data.The estimation errors for all cases in calibration data are shown at the bottom of Table 7.These tables show that the errors of both algorithms are lower than 1.0 K in the wide range of LST and PW except for very hot or very wet environments.When the LST is higher than 310 K or the PW is larger than 3.0 g cm -2 , the errors of both algorithms are significantly increased and the difference in accuracy between two algorithms is also increased.This difference of accuracy can reach 0.74 K from 0.23 K. Therefore, although the error for all cases of the two algorithms makes no difference, the NTB has more accuracy than the NSW_10.4-12.4 in hot or wet environments.This very hot and warm condition corresponds to tropical environments.According to the distribution of calibration data in Fig. 2, the sparsity of data from humid tropics may have caused this error distribution.

Sensitivity analysis
The above results show that the preferable LST algorithms are the NTB and NSW_10.4-12.4.However, when this algorithm is used for LST retrieval, the uncertainty in the input data, such as the LSE values, may cause fluctuations in the estimate of the LST.Thus, we evaluated the robustness of the five algorithms against uncertainties in the input data.In this sensitivity analysis, separate validation data from model atmosphere are used to avoid overfitting.The uncertainties of the LSE for three bands were assumed to be ε λ ± 0.02.The noise equivalent differential tem- perature (NEDT) for three bands were assumed to be 0.1 K (Zou et al. 2016).For the NSWs, the uncertainty of the total PW was assumed to be PW ± 0.5 g cm -2 (Sobrino and Romaguera 2008;Akatsuka et al. 2013).For the validation data with uncertainty of the LSE, NEDT and PW, the RMSEs were computed, and the total estimation error σ total for five algorithms were obtained as follows: where σ RMSE is the estimation error of the algorithm itself; σ ε , σ PW and σ NEDT are the estimation error due to the uncertainties of the LSE, PW and NEDT, respectively.Specifically, for the NSWs, six kinds of errors are due to input data uncertainties; σ RMSE , σ PW , two σ ε and two σ NEDT of the selected TIR bands were consid- ered.For the TB and NTB, seven kinds of errors are due to input data uncertainties; σ RMSE , three σ ε and three σ NEDT of the three TIR bands were considered.
Table 8 shows the results of the sensitivity analysis of five algorithms for each VZA.First, in terms of the error of the algorithm itself, the NSW_10.4-11.2 has the lowest angular dependence on the estimation accuracy, whereas the NSW_11.2-12.4 has the largest one.The NSW_10.4-11.2 has the worst accuracy, and the NSW_10.4-12.4 has the best accuracy at a low VZA region.These characteristics have also been viewed in Fig. 5, but the error values are different from each other.However, unlike the fitting error in calibration data, the accuracy of the NSW_10.4-12.4 is higher than that of the NTB in the VZA ranging from 0° to 60°.This result may represent the relationship of accuracies between the two algorithms over the LST range from 270 K to 310 K in Table 6 because the validation data does not include extremely high and low temperature conditions.
For σ ε , Atitar and Sobrino (2009) reported that the LST estimation error caused by LSE uncertainty is larger than that caused by other input data, such as PW and NEDT.In this study, as expected, the LSE uncertainties cause the largest LST estimation error for all algorithms.For the NSWs, the NSW_10.4-11.2 has a much larger σ ε than the NSW_10.4-12.4.A larger difference of atmospheric absorption character- istics between two TIR bands, corresponds to a more robust correction of atmospheric attenuation against the LSE uncertainties.By contrast, the σ ε of the TB and NTB are lower than 2.0 K over most of the VZA range.It is found that the LST estimation accuracy can become robust against LSE uncertainties by using TIR three-band information.Consequently, the σ total of the NTB is the lowest in the VZA ranging from 0° to 60° among five algorithms.The σ total of the NTB is less than 2.5 K even at VZA 50°, which is different from other algorithms.When the VZA is less than 30°, the σ total difference between the NTB and NSW_10.4-12.4, which is the second-best algorithm, is approximately 1.0 K. From the above results, we have verified that the NTB enables highly accurate and stable LST estimation over a wide range of observations by Himawari-8.

Operational observation method
In practical LST retrieval, a stepwise retrieval method (Li et al. 2013a) that determines the LSE and LST separately is employed.Cloud masking is also needed because the TIR radiation in the range of 10 -12.5 μm radiated from a land surface cannot be transmitted to the clouds.This section introduces the cloud masking and LSE estimation, which are necessary procedures for actual observations using the NTB.

a. Operational coefficients
For operational purposes, two sets of NTB coefficients (one for T 10.4 < 280 and the other for 280 < T 10.4 ) are prepared.This is according to Sun and Pinker (2003), who successfully improved the accuracy of the TB by using two sets of coefficients based on 11 μm brightness temperature.It is expected that the estimation accuracy of the NTB could also be improved by using two sets of coefficients.We calculate two sets of coefficients by using the brightness temperature of band 13 and T 10.4 as a threshold value.The corresponding RMSE is then obtained.This procedure is repeated for seven different threshold values (270 K to 300 K at intervals of 5 K).The RMSEs of the NTB for each threshold value and each VZA are shown in Fig. 6.From these results, we found that the optimum threshold value is 280 K.We then reset two sets of coefficients by using that threshold value.In this case, the RMSE is from 0.41 K to 0.75 K in the VZA ranging from 0° to 65°.In comparison with the estimation accuracy before the improvement (Fig. 5), the RMSE is improved by more than 0.3 K in the VZA ranging from 0° to 60°.

b. Cloud masking
A Himawari-8 cloud mask product has already been developed by the Meteorological Satellite Center (MSC) of the Japan Meteorological Agency (JMA) (Imai and Yoshida 2016).This cloud mask algorithm is based on the cloud mask technique of the Nowcasting Satellite Application Facility of EUMETSAT's for MSG/SEVIRI and the NOAA National Environmental Satellite Data and Information Service for GOES-R/ ABI.Imai and Yoshida (2016) compared the cloud mask product applied to SEVIRI data with the MODIS cloud mask product (MOD35).The result shows that the hit ratios for all seasons are approximately 85 %.Given that this is not a free product, an independent cloud detection method was developed in this study.Further details about the cloud detection method, as well as the validity, are presented in the Appendix.

c. Land surface emissivity
The LSE for three bands are estimated by using the method proposed by Yamamoto and Ishikawa (2018).The estimation accuracy at the vegetation area is less than 0.01, and those at arid and semi-arid areas are less than 0.02.The estimation method is a semi-empirical method, which is a combination of the classification-based method (Snyder et al. 1998;Peres and DaCamara 2005) and the NDVI thresholds method (Sobrino et al. 2008).The land cover classification information is taken from the Global Land Cover by National Mapping Organizations version 3 (GLCNMO2013).The material emissivities of soil, vegetation and others are taken from the MODIS UCSB emissivity library and the ASTER spectral library.Furthermore, the flooding of paddy field and snow/ice coverage are detected by the normalized difference water index (NDWI), NDVI and normalized difference snow and ice index (NDSII).When a pixel is covered by snow/ice, the emissivity value of snow/ ice is assigned independent to the pixel class.For paddy fields, when a pixel is judged as the flooding and rice transplanting period, the high emissivity value that has both grass and water properties is assigned.Accordingly, this procedure requires the visible/nearinfrared bands, namely, band 3, 4 and 5 (centered at 0.64, 0.86 and 1.61).Figure 7 shows the flowchart for the operational observation method of LST which includes cloud masking and LSE estimation.On the whole, our method requires eight visible and infrared bands: bands 3, 4, 5, 7, 10, 13, 14 and 15.

Validation with in-situ data
The validation of satellite observed LST has inherent difficulty because the in-situ observation also suffers from horizontal homogeneity and/or uncertainty of emissivity.Therefore, the Himawari-8 LST product is validated with in-situ LSTs observed over the Tibetan Plateau, where the topography is flat, and the local landscape is homogeneous grassland.The observation site is located at 31°22′8.73″N, 91°53′55.26″E;the altitude is 4511 m.This site was referred to in many previous studies (e.g., Ma et al. 2006Ma et al. , 2007)).The VZA is 63.56°.However, given this large VZA and high altitude, the projection point of AHI is shifted to the northwestward direction.The shift is taken into consideration to match the observation site.The surface temperature is continuously measured by a radiation thermometer with a preset global emissivity value of 0.90, and a 10-minute average value is recorded.The values are compared for three seasons: winter (January 1 -31, 2016), spring (April 1 -30, 2016) and summer (July 1 -31, 2016).Figure 8 shows the scatter plots of the Himawari-8 LSTs and in-situ LSTs.The correlations of all seasons are greater than 0.96, and the RMSEs are less than 4 K.The RMSE at the VZA 63.56° is supposed to be approximately 3 K from the simulation results (see Table 8).In this comparison, the RMSEs are a little greater than the simulation.The possible sources of this are thought to be the cloud detection failure (e.g., partial sub-grid cloud) and the difference of the scales of AHI and field sensor.In a previous study, Oku and Ishikawa (2004) also validated the Geostationary Meteorological Satellite-5 LST product with in-situ LSTs over the Tibetan Plateau.They estimated the LST by using a linear split-window algorithm.Their results showed that the RMSE is 5.53 -11.99 K.In comparison with Oku and Ishikawa (2004), the estimation accuracy of the Himawari-8 LST has considerable validity.Some apparent underestimations and the general tendency of slight underestimation are probably caused by cloud removal failure or contamination by sub-pixel scale clouds; these phenomena explain why the biases in April and July are negative.Conversely, in the extremely low temperature region on January, the Himawari-8 LST is systematically higher than the ground observed LST.Several possible causes are postulated, including measurement error by radiation thermometer; however, the causes have not been located yet.It is also thought that another reason is the misdetection of cloud masking due to the temperature inversions for nighttime.

Conclusions
We proposed an NTB that can estimate the LST from observation data in three thermal infrared bands (in the range of 10 -12.5 μm).First, we computed the fitting coefficients and examined the dependence of the estimation accuracy on the VZA, LST and PW.The calibration data are generated by the Rstar6b code with TIGR data and various types of surface emissivity.The total number of the simulations was 106480.On the basis of a comparison of NTB and NSW_10.4-12.4,the result showed that the NTB can stably estimate the LST, particularly in hot and wet environments, even though the NSW_10.4-12.4 is slightly better in the smaller VZA.
Second, we evaluated the sensitivities of five algorithms to the uncertainties in LSE, PW and NEDT by using the validation data independent of calibration data.It was clarified that the NTB has the highest robustness against the uncertainties in the LSEs and NEDTs of the three TIR bands.The NSW_10.4-12.4 is the second-best algorithm.The total estimation error of NTB is approximately two-thirds of NSW_10.4-12.4 for a VZA smaller than 40°.In practical LST retrieval, the input data of an LST algorithm is fluctuated by climate, geographical, artificial and other factors.For example, changes in surface wetness, rapid urbanization and seasonal variation of vegetation cause the LSE estimation error (Li et al. 2013b).The wet and hot environments along the Pacific coast areas or regions in the maritime continent may cause large PW estimation errors (Zou et al. 2016;Sobrino and Romaguera 2008).Moreover, even if the NSW_10.4-12.4 is applied, all three thermal infrared bands are needed in taking the cloud mask procedure into consideration.Therefore, the NTB is more suitable for LST retrieval from Himawari-8 than the NSW, which was previously used for the MSG and MTSAT.The NTB is particularly helpful for studying land surface energy fluxes (Oku et al. 2007) and small-scale environments, such as urban climates (Weng 2009) in Asia and Oceania.Finally, The NTB is combined with a simple cloud mask procedure and the LSE estimation as a stepwise retrieval method.The LST product was evaluated using in-situ data over the Tibetan Plateau.The result shows that the estimation accuracy of the product has considerable validity even at the high altitude and large VZA.

Appendix
Clouds are characterized by higher reflectance and lower temperature than the land surface.Therefore, they can roughly be detected by setting appropriate threshold values of reflectance and brightness temperature in visible and infrared data.However, there are several cumbersome cases in land surfaces and cloud types.For example, snow/ice surfaces have high reflectance and low temperature; thus, they can be incorrectly detected as clouds.Thin cirrus clouds have low influence on the observation of reflectance and brightness temperature; thus, they may be judged as clear pixels.It is also difficult to detect low-level water clouds at night because the visible band is not available.In such cases, certain combinations of reflectances and/or brightness temperatures observed by different bands are effective in separating clouds and cloud-free conditions.
Figure A1 shows the flowchart of cloud mask procedure.First, the snow/ice pixels are detected using the NDSII that is calculated as NDSII = (R 0.64 -R 1.6 )/ (R 0.64 + R 1.6 ) (Xiao et al. 2001).R 0.64 and R 1.6 are the reflectance observed in the visible (band 3) and nearinfrared bands (band 5) of the AHI.The NDSII data is obtained as a composite of the past four days.If the NDSII value is greater than 0.4, the pixel is classified as a snow/ice surface.Thereafter, different cloud detection tests are used depending on the solar zenith angle (SZA) of each pixel.Three scenes are consid- ered: daytime (SZA < 85°), twilight (85° £ SZA < 93°) and nighttime (SZA ³ 93°).Table A1 shows the cloud detection tests at daytime, twilight and nighttime over the land surface.Table A2 shows the cloud detection tests at twilight and nighttime over the snow/ice surface.At daytime over the snow/ice surface, a pixel is classified as clear if all of the following conditions are satisfied: where ξ is the sunlight scattering angle at the surface and R is a reflectance observed in a visible or near-infrared band.The subscripts for the above equations are Tables A1 and A2, 0.86, 3.9, 7.3, 11.2 and 12.4 denote the center wavelengths of bands 3, 4, 7, 10, 14 and 15 of the AHI, respectively.T 11.2_clear and R 0.64_clear are the clear-sky brightness temperature and clearsky reflectance obtained as the maximum (minimum) observed value of the one month composites for each UTC.
Cloud detection in Tables A1 and A2 consist of several tests.For each test, a confidence value between zero and one is assigned.The value is calculated using two thresholds, namely, A and B, and increases toward clear sky value (= 1) as follows: if A < B, where x is an observation value of a test.The thresholds used in this study, A and B, are based on previous studies.The T 11.2 and R 0.64 tests are proposed by Choi and Ho (2009).The threshold values over the cold surfaces at twilight and nighttime are modified by Météo-France (2013) as in Tables A1 and A2.The T 11.2_clear and R 0.64_clear are obtained as one-month composite, but there shall be a case that no clear sky appears in a month.Therefore, these values are updated every year to improve accuracy.At this time, T 11.2_clear and R 0.64_clear are updated for only two years (i.e., approximate 60day composites) at most.In winter season, there is a high possibility that a pixel cannot acquire the R 0.64_clear value in snow free conditions because of snow coverage.In such cases, the constant values of R 0.64 , A = 0.14 and B = 0.22 of MOD35 are set as threshold over the vegetated area.Over arid and semiarid area, the R 0.86 test is applied instead of R 0.64 test.These tests are based on the MOD35 algorithm.The discrimination between the nonarid and arid/semiarid region is taken from the land cover classification information, the GLCNMO 2013 (Tateishi et al. 2011(Tateishi et al. , 2014;;Kobayashi et al. 2017).
After the cloud detection tests are applied to each pixel, the total evaluation of whether each pixel is clear or cloudy is decided using the confidence values of each test.Note that if each confidence value is considered equally, the detection effect for the particular clouds becomes weak because of the overlap in cloud types that the tests detect.To avoid the overlap and evaluate fairly, test results are divided into three groups according to the characteristics of clouds that each test can detect.Group 1 consists of the tests capable of detecting high-and mid-level clouds: T 11.2 and T 7.3 -T 11.2 tests.Group 2 consists of the tests capable of detecting low-level clouds: R 0.64 , R 0.86 and T 11.2 -T 3.9 tests.Group 3 consists of the tests capable of detecting thin cirrus: T 11.2 -T 12.4 and T 3.9 -T 12.4 tests.A representative value of each group, G i , is determined as the minimum confidence, min[F i ].Thereafter, the final cloud mask index, C, is the geometrical mean of representative values: In this study, C is greater than 0.95, it is considered a clear sky.After computing the C for all pixels, the spatial uniformity test is applied.This test removes the uncertainties in cloud detection tests by comparing the value with the eight surround pixels.The uncertainties are introduced over the pixels of mixed surface characteristics, such as coastal (mixture of land and water), semiarid (bare soil and vegetation) and suburban areas (manmade material and vegetation).The coastal pixels may have a large temperature gradient between land and water.The semiarid and suburban pixels may have variable proportions of vegetation.Such heterogeneities affect the confidence of cloud mask index.In such areas, there is the possibility that a pixel is perpetually determined as clear or cloudy.
To avoid such cases, when the target pixel is cloudy but the surrounding pixels (except for water pixels) are clear, it is modified as clear or vice versa.
The cloud mask product is compared with the JMA-MSC cloud mask product to evaluate the validity.
Three comparison periods are employed for winter (January 1 to 31, 2016), spring (April 1 to 30, 2016) and summer (July 1 to 31, 2016).The distribution area of the JMA-MSC cloud mask product is from 0.01°S to 52.01°N and from 113.99°E to 180.01°E with a mesh of 0.02° squares, and the temporal resolution is 1 hour.The temporal change of hit ratio, which is the percentage of the sum of the number of coincidence for clear sky and for cloudy to total pixels, is shown in Fig. A2.The values in April and July are higher than 85 %.The hit ratio in January is slightly lower at approximately 80 %.As mentioned above, the threshold value for T 11.2_clear and R 0.64_clear need further accumulation or updates.Therefore, it is to be expected that the accuracy of our cloud mask product improves as Himawari-8 data are accumulated.In conclusion, the product is sufficiently applicable to the LST retrieval even at the present status.

Fig. 2 .
Fig. 2. The selected 215 radiosonde profiles in TIGR database.(a) Spatial distribution and (b) air temperature (T 0 )of the first layer and the total PW distribution.

Fig. 4 .
Fig. 4. The relationship between the NTB coefficients and the secant of VZA.The left figure shows the fitting results for a0 -a3, and the right figure shows the fitting results for a4 -a9.

Fig. 7 .
Fig. 7.The flowchart for the operational observation method of LST.

Fig. 8 .
Fig. 8.The scatter diagrams of Himawari-8 LST and ground observed LST for one month in January, April and July 2016.

Fig. A2 .
Fig. A2.The temporal changes of the hit ratio of the cloud mask product for one month in January, April and July 2016.

Table 4 .
The coefficients of the TB.

Table 5 .
The coefficients of the NTB.

Table 7 .
RMSEs for different PW classes using calibration data.

Table 6 .
RMSEs for different LST classes using calibration data.

Table 8 .
Estimation error (K) of five LST algorithms due to input data uncertainties obtained from validation data.The uncertainties of the LSE, PW and NEDT are assumed ± 0.02, ± 0.5 g cm -2 and ± 0.1 K for each band.

Table A1 .
The cloud detection tests at daytime, twilight and nighttime over the land surface.
Saunders and Kriebel (1988) is small over optically thick clouds regarded as a blackbody.The value is large over optically thin clouds, such as cirrus clouds.Saunders and Kriebel (1988)set the threshold values of the T 11.2 -T 12.4 test.Their proposed tabulated thresholds depend on T 11.2 by considering that the T 11.2 -T 12.4 becomes large as the T 11.2 becomes large even under a clear sky.Following their case, we estimate the relation between the T 11.2 -T 12.4 and T 11.2 by using the Rster6b radiative transfer model and set the threshold values as the function of T 11.2 and VZA.
The threshold values over snow/ice surface are higher than over a non-snow surface because the LSE at 12.4 μm is much lower than at 11.2 μm over snow/ice surface.

Table A2 .
The cloud detection tests at twilight and nighttime over the snow/ice surface.