Page Segmentation Based on Effective Area of Background

Koichi Kise; Yuji Akiyoshi; Shinobu Takamatsu

doi:10.1541/ieejeiss1987.116.9_1035

抄録

Page segmentation is the task of extracting components of a document such as columns, textlines, figures and tables from a page image. This paper presents a method of page segmentation by analyzing background (white areas) of a page image. For page images without skew, it is known that white areas can be represented as white rectangles each of which maximally circumscribes white pixels. In general, however, extracted white rectangles include gaps between characters, words and those in figures which cause erroneous segmentation. Thus white rectangles need to be selected for correct segmentation. The characteristic point of our method is that the selection is based on the simple measure called effective area which is to estimate the effectiveness of a white rectangle as a delimiter of components by taking account of proximity with surrounding black areas. Our method correctly segmented 92.7% of components in 154 images of Japanese and English pages with various layout and resolution.

著者関連情報

お気に入り & アラート

閲覧履歴

発行機関からのお知らせ

【電気学会会員の方】購読している論文誌を無料でご覧いただけます（会員ご本人のみの個人としての利用に限ります）。購読者番号欄にMyページへのログインIDを，パスワード欄に生年月日8ケタ（西暦，半角数字。例：19800303）を入力して下さい。

ダウンロード

論文(PDF)の閲覧方法はこちら
閲覧方法 (327.9K)

前身誌

電気学会論文誌. C

電氣學會雜誌

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）