Journal of The Remote Sensing Society of Japan
Online ISSN : 1883-1184
Print ISSN : 0289-7911
ISSN-L : 0289-7911

This article has now been updated. Please use the final version.

Evaluation of the Amount and Quality of Training Data for Paddy Rice Crop Classification with Random Forest using Time-series ALOS-2 PALSAR-2 Data
Keishiro NakamotoKei Oyoshi
Author information
JOURNAL FREE ACCESS Advance online publication

Article ID: 2023.004

Details
Abstract

Machine learning has recently come into widespread use for the highly accurate classification of cultivated land and land cover using satellite data. Accurate classification requires a sufficient amount and quality of training data, but the collection of data for training is very costly. Therefore, to evaluate the relationship between the required amount and quality of training data and classification accuracy, this study evaluated paddy rice discrimination in California, US, using ALOS-2 PALSAR-2 data with random forest in a case study.

The US Department of Agriculture (USDA) Cropland Data Layer (CDL) made considerable training data available on land cover distribution in 2021. The amount of training data was evaluated after the data volume increased from 100 to 100,000 samples. The quality of the training data was determined by randomly replacing a certain percentage of paddy/non-paddy labels in the training data with incorrect labels. This case study then evaluated the correlation between the amount of training data and the accuracy (ACC) of classification. We found that at least 1,000 training samples are necessary to achieve 0.95 ACC stably under the condition of this study. Next, the study evaluated the correlation between the quality of training data and classification accuracy and found that ACC can be maintained above 0.95 for an error ratio of up to 20 % if there are more than 1,000 samples.

Content from these authors
© 2023 The Remote Sensing Society of Japan
feedback
Top