2017 年 32 巻 1 号 p. E-G24_1-8
We propose a method for extracting semantic structure from procedural texts for more intelligent search or analysis. Procedural texts represent a sequence of procedures to create an object or to make an object be in a certain state, and have many potential applications in artificial intelligence. Procedural texts are relatively clear without modality nor dependence on viewpoints, etc. Thus they can be described their procedures using flow graphs. We adopt recipe texts as procedural text examples and directed acyclic graphs (DAGs) to represent semantic structure. Nodes of a flow graph are important terms in a recipe text and vertices are relationships between the terms such as language phenomena including dependency, predicate-argument structure, and coreference. Because trees can not represent the procedures of recipes sufficiently, DAGs are adopted as the representation of recipes. We first apply word segmentation, automatic term recognition, and then convert the entire text into a flow graphs. For word segmentation and automatic term recognition, we adopt existing methods. Then we propose a flow graph estimation method from term recognition results. Our method is based on the maximum spanning tree algorithm, which is popular in dependency parsing, and simultaneously deals with language phenomena listed above. We experimentally evaluate our method on a flow graph corpus created from various recipe texts on the Internet.