農業気象
Online ISSN : 1881-0136
Print ISSN : 0021-8588
ISSN-L : 0021-8588

この記事には本公開記事があります。本公開記事を参照してください。
引用する場合も本公開記事を引用してください。

Crop phenology data extraction from research papers using a large language model
Toshichika IIZUMIYohei ONOTakahiro TAKIMOTOChaogejilatu
著者情報
ジャーナル オープンアクセス HTML 早期公開
電子付録

論文ID: D-24-00042

この記事には本公開記事があります。
詳細
抄録

Abstract  Field experiment data or crop observations at sites reported in agronomic literature are of high quality and have been considered as a potential source of information for the development of a global grid crop dataset. However, extracting data on a crop variable of interest from the text and tables of many papers is a time-consuming, painstaking task for dataset developers. Recent advances in large language models (LLMs) and resulting tools are expected to provide a promising solution. This study presents a computational method for extracting data from research papers using an LLM-based online tool, ChatPDF. The Python program we developed is applied to the 164 papers to extract crop phenology data of maize, soybean, wheat and rice for demonstration purposes. The results show that the LLM-based data extraction method can dramatically reduce the burden of data extraction in human curation, but needs improvement to become a reliable alternative that can replace manual data extraction. In particular, innovations are needed to increase the capture rate by avoiding data omissions and to reduce errors by correctly inferring longitudes, latitudes and harvesting years. The LLM-based data extraction is currently in its infancy and deserves future research for large-scale implementation.

著者関連情報
© Author (s).

This article is licensed under a Creative Commons [Attribution 4.0 International] license.
https://creativecommons.org/licenses/by/4.0/
feedback
Top