Recently, I needed a finite automata library written in C (for those of you who don’t remember their formal language classes too well, finite automata are the theoretical underpinning of regular expressions) In a nutshell, a finite automaton represents the set of all strings matching a regular expression.
Such a library is a little different from regular expression matching, for which there are plenty of libraries, like GNU regex, because it supports operations that are impossible with regular expressions alone, such as intersection and the more exotic deciding of ambiguity.
Unfortunately, there isn’t a well-maintained open-source C library to do that. Lucky for me, there is a very well-written Java library, dk.brics.automaton by Anders Møller. Based on that, I just finished implementing libfa. It is built as a separate DSO, but it’s not distributed separately from Augeas yet; if you need an FA library and
libfa seems like it would be useful, drop me a line and I’ll split it out. It’s mostly a matter of wrestling with autotools.
If you’re curious about what you can do with finite automata that you can’t with regular expressions alone, deciding ambiguous concatenation is a good example: the concatenation of two regular expressions
r2 is ambiguous if there is a string
upv that matches
r1r2 such that both
r1 (and therefore
Ambiguity is important if you attach actions to
r2, for example, delete strings matching
r1 from the output and convert everything matching
r2 to uppercase: if you’re confronted with an ambiguous string
upv, you don’t know what to do with
p: should you delete it (splitting
v) or should you upcase it (splitting
pv). What’s worse, when you match
upv, you’ll never know that there is ambiguity, and how it gets split largely depends on minute details of the implementation of the regular expression matcher.