Crop phenology data extraction from research papers using a large language model

Toshichika IIZUMI; Yohei ONO; Takahiro TAKIMOTO; Chaogejilatu

doi:10.2480/agrmet.D-24-00042

抄録

Abstract 　Field experiment data or crop observations at sites reported in agronomic literature are of high quality and have been considered as a potential source of information for the development of a global grid crop dataset. However, extracting data on a crop variable of interest from the text and tables of many papers is a time-consuming, painstaking task for dataset developers. Recent advances in large language models (LLMs) and resulting tools are expected to provide a promising solution. This study presents a computational method for extracting data from research papers using an LLM-based online tool, ChatPDF. The Python program we developed is applied to the 164 papers to extract crop phenology data of maize, soybean, wheat and rice for demonstration purposes. The results show that the LLM-based data extraction method can dramatically reduce the burden of data extraction in human curation, but needs improvement to become a reliable alternative that can replace manual data extraction. In particular, innovations are needed to increase the capture rate by avoiding data omissions and to reduce errors by correctly inferring longitudes, latitudes and harvesting years. The LLM-based data extraction is currently in its infancy and deserves future research for large-scale implementation.

著者関連情報

This article is licensed under a Creative Commons [Attribution 4.0 International] license.
https://creativecommons.org/licenses/by/4.0/

お気に入り & アラート

閲覧履歴

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）