Finite Automata in C

07 March 2008

Recently, I needed a finite automata library written in C (for those of you who don’t remember their formal language classes too well, finite automata are the theoretical underpinning of regular expressions) In a nutshell, a finite automaton represents the set of all strings matching a regular expression.

Such a library is a little different from regular expression matching, for which there are plenty of libraries, like GNU regex, because it supports operations that are impossible with regular expressions alone, such as intersection and the more exotic deciding of ambiguity.

Unfortunately, there isn’t a well-maintained open-source C library to do that. Lucky for me, there is a very well-written Java library, dk.brics.automaton by Anders Møller. Based on that, I just finished implementing libfa. It is built as a separate DSO, but it’s not distributed separately from Augeas yet; if you need an FA library and libfa seems like it would be useful, drop me a line and I’ll split it out. It’s mostly a matter of wrestling with autotools.

If you’re curious about what you can do with finite automata that you can’t with regular expressions alone, deciding ambiguous concatenation is a good example: the concatenation of two regular expressions r1 and r2 is ambiguous if there is a string upv that matches r1r2 such that both u and up match r1 (and therefore pv and v match r2)

Ambiguity is important if you attach actions to r1 and r2, for example, delete strings matching r1 from the output and convert everything matching r2 to uppercase: if you’re confronted with an ambiguous string upv, you don’t know what to do with p: should you delete it (splitting upv into up and v) or should you upcase it (splitting upv into u and pv). What’s worse, when you match r1r2 against upv, you’ll never know that there is ambiguity, and how it gets split largely depends on minute details of the implementation of the regular expression matcher.

Creative Commons License Watzmann.Blog by David Lutterkort is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.

Generated with Jekyll