Proceedings of the Fuzzy System Symposium
28th Fuzzy System Symposium
Conference information

main
A Study on Retrieval Support for OCR Documents based on Visualization
Kazuki TamuraTomohiro YoshikawaTakeshi FuruhashiMakoto Suzuki
Author information
CONFERENCE PROCEEDINGS OPEN ACCESS

Pages 574-579

Details
Abstract

Recently, the digitization of paper-based documents is rapidly advanced through the spread of scanners. These documents are usually managed by tagging or sorting into folders on a computer. However, tagging or sorting a huge amount of scanned documents one by one is difficult in terms of time and effort. Therefore, the document retrieval system using the texts in the documents, which is available from OCR (Optical Character Recognition), will be useful. The aim of this study is to extract the relationship among documents using pLSI, one of the most popular Topic Models. Topic Models have been applied to the documents without errors by OCR. In this paper, the preliminary experimental result shows that OCR errors negatively affect the performance of pLSI at first. Then, the proposed method that aggregates the similar expressions of words using Levenshtein distance is applied and the improvement of the inference performance of topics is shown.

Content from these authors
© 2012 Japan Society for Fuzzy Theory and Intelligent Informatics
Previous article Next article
feedback
Top