Recently, I needed a finite automata library written in C (for those of you who don’t remember their formal language classes too well, finite automata are the theoretical underpinning of regular expressions) In a nutshell, a finite automaton represents the set of all strings matching a regular expression.
Such a library is a little different from regular expression matching, for which there are plenty of libraries, like GNU regex, because it supports operations that are impossible with regular expressions alone, such as intersection and the more exotic deciding of ambiguity.
Unfortunately, there isn’t a well-maintained open-source C library to do that. Lucky for me, there is a very well-written Java library, dk.brics.automaton by Anders Møller. Based on that, I just finished implementing libfa. It is built as a separate DSO, but it’s not distributed separately from Augeas yet; if you need an FA library and libfa
seems like it would be useful, drop me a line and I’ll split it out. It’s mostly a matter of wrestling with autotools.
If you’re curious about what you can do with finite automata that you can’t with regular expressions alone, deciding ambiguous concatenation is a good example: the concatenation of two regular expressions r1
and r2
is ambiguous if there is a string upv
that matches r1r2
such that both u
and up
match r1
(and therefore pv
and v
match r2
)
Ambiguity is important if you attach actions to r1
and r2
, for example, delete strings matching r1
from the output and convert everything matching r2
to uppercase: if you’re confronted with an ambiguous string upv
, you don’t know what to do with p
: should you delete it (splitting upv
into up
and v
) or should you upcase it (splitting upv
into u
and pv
). What’s worse, when you match r1r2
against upv
, you’ll never know that there is ambiguity, and how it gets split largely depends on minute details of the implementation of the regular expression matcher.
Watzmann.Blog by David Lutterkort is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.
Generated with Jekyll