Computer Software
Print ISSN : 0289-6540
Parser Combinators for Parsing Semi-Structured Texts
Futoshi IWAMATaiga NAKAMURAHironori TAKEUCHI
Author information
JOURNAL FREE ACCESS

2012 Volume 29 Issue 4 Pages 4_258-4_277

Details
Abstract
This paper provides a novel framework for constructing a parser to process and analyze texts written in a “semi-formalized” language or “partially-structured” natural language. In many projects, the contents of document artifacts tend to be described as a mixture of the text constructs following specific conventions and the parts written in arbitrary free text. Formal parsers, typically defined and used to process a description with rigidly defined syntax such as program source code are very precise and efficient in processing the former part, while parsers developed for natural language processing (NLP) are good at interpreting the latter part robustly. Combining these parsers with different characteristics makes it more flexible and practical to process various project documents. Unfortunately, conventional approaches to constructing a parser from multiple small parsers were studied extensively for only formal language parsers and not immediately applicable to NLP parsers due to the differences in the way the input text is extracted and evaluated. We propose a method to configure and generate a combined parser by extending an approach based on parser combinators, the operators for composing multiple formal parsers, to support both NLP and formal parsers. The resulting text parser is based on Parsing Expression Grammars, and it benefits from the strength of both parser types. We demonstrate an application of such combined parsers in practical situations and show that the proposed approach can efficiently construct a parser for analyzing project-specific documents.
Content from these authors
© 2012 Japan Society for Software Science and Technology
Previous article Next article
feedback
Top