電気学会論文誌C(電子・情報・システム部門誌)
Online ISSN : 1348-8155
Print ISSN : 0385-4221
ISSN-L : 0385-4221
白領域の有効面積に基づく文書画像領域分割
黄瀬 浩一秋吉 裕二高松 忍
著者情報
ジャーナル フリー

1996 年 116 巻 9 号 p. 1035-1042

詳細
抄録
Page segmentation is the task of extracting components of a document such as columns, textlines, figures and tables from a page image. This paper presents a method of page segmentation by analyzing background (white areas) of a page image. For page images without skew, it is known that white areas can be represented as white rectangles each of which maximally circumscribes white pixels. In general, however, extracted white rectangles include gaps between characters, words and those in figures which cause erroneous segmentation. Thus white rectangles need to be selected for correct segmentation. The characteristic point of our method is that the selection is based on the simple measure called effective area which is to estimate the effectiveness of a white rectangle as a delimiter of components by taking account of proximity with surrounding black areas. Our method correctly segmented 92.7% of components in 154 images of Japanese and English pages with various layout and resolution.
著者関連情報
© 電気学会
前の記事 次の記事
feedback
Top