ビジネス文書からのメタデータ抽出のためのルール自動生成技術

松本 俊子; 大峡 光晴; 小野山 隆; 秋吉 政徳

doi:10.1541/ieejeiss.131.1502

抄録

Toward facile introduction of metadata-based document management system, we propose an algorithm which uses sample documents and their manually specified metadata as training data, and generates metadata-extraction rules. Our algorithm enumerates candidates of keywords and layout characteristics specific to the metadata on the basis of metadata occurrence in the training data. And then it examines whether each candidate is specific to only one kind of metadata. In an experiment on Japanese business documents and weekly reports, automatically generated rules have achieved metadata extraction as accurate as manually adjusted one.

著者関連情報

お気に入り & アラート

閲覧履歴

発行機関からのお知らせ

【電気学会会員の方】購読している論文誌を無料でご覧いただけます（会員ご本人のみの個人としての利用に限ります）。購読者番号欄にMyページへのログインIDを，パスワード欄に生年月日8ケタ（西暦，半角数字。例：19800303）を入力して下さい。

ダウンロード

論文(PDF)の閲覧方法はこちら
閲覧方法 (389.7K)

前身誌

電気学会論文誌. C

電氣學會雜誌

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）