A while ago I had what would be a hallway conversation with Mark if we worked in the same office (or country, for that matter.) Something he said set me thinking that getting a better handle on the mess of file formats in /etc would be possible, and in a way that would hide much of the pain those different formats inflict when config files need to be changed programmatically; it’s actually nice that config data is stored in text files for interactive use (yes, that means vi), but a smoldering trainwreck when changes need to be scripted.
Editorial Note: we apologize for the length of this entry. If you don’t want to read all this slipslop, feel free to go straight to the Augeas website. Just tell lutter how much fun you had reading his blog
It’s a commonplace that the colorful variety of files and file formats used to configure the average Linux (or Unixy) system keeps us from having any sort of API to modify config data, and that any attempt to change that is doomed. Pretty exactly a year ago, I argued precisely that point (convincingly, I thought): that the best we can hope for is to have a few better tools for each service to modify its configuration. Maybe we can even build something on top of those tools, but that that’s about as far as any such attempt could ever go in practice.
After that non-hallway conversation, it dawned on me that the various attempts to deal with this situation boiled down to three different approaches:
Bear it and smile; if you are unfortunate enough to have to make config changes, fiddle with `sed` or `awk` or the equivalent regexp functionality in your favorite scripting language long enough to make those changes, and keep fingers crossed that nobody with an "unusual" file will ever run your script; unusual might be as simple as whitespace at the end of a line.
Propose a real API with a real data store. The individual approaches vary, but it usually boils down to exposing a tree through the API, and storing config data in LDAP/a relational DB/XML/anywhere else anybody has ever stored data. Once that's implemented, all we need is for every program that reads config data from one of those files in `/etc` to use that really good API.
Use templating; expose some form of API that in the backend just fills values into some sort of template and writes that into the right place in `/etc`.
All these have been tried, and they all have serious limitations:
The `sed` approach is the most widely used, and its problems are pretty well understood: works reasonably well for simple file formats, but changing `dhcpd.conf` that way is not for the faint of heart. The bigger problem is that that's just no way to build an API, and the same "solution" for editing config file X gets reinvented for a variety of reasons — that excellent script to change X is written in Python, and Perl is needed, or that that code is impossible to find, buried deep in the guts of something else, or, most likely, config editing is a pain, nothing can be done about it, so suck it up and get away from it, quick.
The unified API approach generally starts with a lot of good thought, and a good list of goals, usually way beyond just editing files; in practice they go nowhere: if they don't collapse under the weight of all the really good things that you can do with an API on top of editing, reality comes to kill them, because upstream generally doesn't jump up and down at the opportunity to change their code, for a very valid reason: the API is completely unproven, and there's no guarantee that it will ever be widely accepted. There are very strong negative network effects in place that kill any config API that requires upstream changes to be useful.
Templating works in some situations, but has the huge drawback that the templating mechanism is the *only* one that will ever be allowed to touch the "real" config files. Besides, coming up with templating schemes that work well for a wide variety of uses and a reasonable set of config files is hard.
With all this in mind, my list of requirements for Augeas roughly looked like this:
Make programmatic edits of config data easy and reliable, and build a simple API around that. Edits should lead to intuitively "minimal" diffs; in particular preserve comments and formatting details.
Do as little beyond that as possible. In systems management, premature modeling is the root of all evil.
Make it reasonably easy to describe config file formats and how data should be exposed through the API. Ideally, these descriptions can be improved incrementally as their use turns up inevitable flaws in the descriptions.
Augeas must be useful without any changes to upstream code.
No additional data store. The only data that Augeas can use is what's in the config files, together with the description of file formats.
After banging my head against the above for a while, and learning most, if not all, of the ways in which not to achieve it, I came across some work by the programming languages group at Penn, in particular Harmony and Boomerang. That work sent me down the right path. Because of the nice theoretical foundation laid by Harmony and Boomerang, Augeas checks descriptions statically (i.e., before they are ever used to modify a single file) to guard against a whole host of possible problems. Some of these problems are quite subtle, and are much easier for a computer to detect than for a human.