2011 年 131 巻 8 号 p. 1502-1511
Toward facile introduction of metadata-based document management system, we propose an algorithm which uses sample documents and their manually specified metadata as training data, and generates metadata-extraction rules. Our algorithm enumerates candidates of keywords and layout characteristics specific to the metadata on the basis of metadata occurrence in the training data. And then it examines whether each candidate is specific to only one kind of metadata. In an experiment on Japanese business documents and weekly reports, automatically generated rules have achieved metadata extraction as accurate as manually adjusted one.
J-STAGEがリニューアルされました! https://www.jstage.jst.go.jp/browse/-char/ja/