Post details: Finite Automata in C

03/07/08

Permalink 10:18:11 am, Categories: Programming, 331 words  

Finite Automata in C

Recently, I needed a finite automata library written in C (for those of you who don't remember their formal language classes too well, finite automata are the theoretical underpinning of regular expressions) In a nutshell, a finite automaton represents the set of all strings matching a regular expression.

Such a library is a little different from regular expression matching, for which there are plenty of libraries, like GNU regex, because it supports operations that are impossible with regular expressions alone, such as intersection and the more exotic deciding of ambiguity.

Unfortunately, there isn't a well-maintained open-source C library to do that. Lucky for me, there is a very well-written Java library, dk.brics.automaton by Anders Møller. Based on that, I just finished implementing libfa. It is built as a separate DSO, but it's not distributed separately from Augeas yet; if you need an FA library and libfa seems like it would be useful, drop me a line and I'll split it out. It's mostly a matter of wrestling with autotools.

If you're curious about what you can do with finite automata that you can't with regular expressions alone, deciding ambiguous concatenation is a good example: the concatenation of two regular expressions r1 and r2 is ambiguous if there is a string upv that matches r1r2 such that both u and up match r1 (and therefore pv and v match r2)

Ambiguity is important if you attach actions to r1 and r2, for example, delete strings matching r1 from the output and convert everything matching r2 to uppercase: if you're confronted with an ambiguous string upv, you don't know what to do with p: should you delete it (splitting upv into up and v) or should you upcase it (splitting upv into u and pv). What's worse, when you match r1r2 against upv, you'll never know that there is ambiguity, and how it gets split largely depends on minute details of the implementation of the regular expression matcher.

Comments:

Comment from: Vadim Nasardinov [Visitor]
Hard to tell from the above if Ragel will be of any use to you, but (a) it ships in Fedora and (b) it's really awesome: http://www.cs.queensu.ca/~thurston/ragel/
Permalink 03/08/08 @ 06:11
Comment from: lutter [Member]
Hi Vadim,

Ragel doesn't work for me since it compiles regular expressions to a host language, its functionality is pretty similar to Lex . In my case, the regular expresions are part of a user's input.
Permalink 03/10/08 @ 10:14
Comment from: Rich [Visitor]
I must say I'm a bit rusty on the different classes of parsers out there, but I did recently come across this one which might do what you want:

http://aurochs.fr/

It generates C, apparently.
Permalink 03/12/08 @ 04:33
Comment from: lutter [Member]
Hi Rich,

for what I am doing, a parser generator doesn't work. I am writing an interpreter for a domain-specific language, and the regular language computations are part of the typechecker
Permalink 03/12/08 @ 09:12

Comments are closed for this post.

Search

Syndicate this blog XML

What is RSS?

Misc

powered by
b2evolution