レシピ用語の定義とその自動認識のためのタグ付与コーパスの構築

笹田 鉄郎; 森 信介; 山肩 洋子; 前田 浩邦; 河原 達也

doi:10.5715/jnlp.22.107

Abstract

In natural language processing (NLP), recognizing important terms after word recognition (word segmentation, part-of-speech tagging, etc.) is practical. In general, terms are word sequences and are classified into different types in many applications. A famous example is the named entity that aims to extract information from newspaper articles. This has seven or eight types (named entity classes) such as person name, organization name and amount of money. The definition of important terms depends heavily on the NLP task. We chose term extraction from recipes (cooking procedure texts) as our task. We discuss a process to define terms and types, annotate corpus, and construct a practically accurate automatic recognizer of recipe terms. The recognizer can potentially be applied to search functions that are more intelligent than simple keyword match and symbol grounding researches, wherein we can match videos and language expressions. Based on these backgrounds, in this study, we discuss the definition of a tag set for recipe terms and real annotation work. Furthermore, we present the experimental results of automatic recognition of recipe terms and provide an insight into the number of annotations required for realizing a certain degree of accuracy.

Content from these authors

Licensed under CC BY 4.0
https://creativecommons.org/licenses/by/4.0/

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!