2012 Volume 29 Issue 1 Pages 1_147-1_158
Most of the conventional implementations of regular expressions are based on backtracking. Such implementations are slow in the worst case, and thus, we would like to develop a better matching algorithm. However, it is nontrivial to provide an efficient matching algorithm that can deal with practical extensions including submatch addressing. This paper studies regular expression with lookaheads and negative lookaheads, abbreviated to REwLA. First, we propose a transformation from a REwLA of size m to a deterministic finite automaton of O(22m) states. Next, we consider weighted regular expressions, which enable us to calculate submatch addressing. We propose a transformation from a weighted REwLA of size m to a weighted nondeterministic finite automaton of O(22m) states.