Watzmann.Blog

Entirely too many words about idempotency in systems management

21 May 2018

Idempotency is one of the fundamental ideas when managing systems: it’s both convenient and natural to demand that any management action has the same result whether it’s performed once or multiple times. For example, if the management action is ‘make sure that httpd is running’, we only want to say that once, and if, for some reason, that action gets performed multiple times, the result should stay the same. In this post, I’ll use ‘classical’ management actions on long-lived, stateful servers as examples, but the arguments apply in the same way to management actions that manipulate cloud services or kubernetes clusters, or really any other system that you’d want to manage.

It has always bothered me that it’s not obvious that stringing such idempotent actions together will always be idempotent, too.

Formally an action is a function f that turns some system state x into an updated state f(x). Idempotent then means that f(f(x)) = f(x), which I’ll also write as f f = f, dropping the argument x to f. For two actions f and g to be idempotent when we string them together then means that f g f g = f g. Clearly, if f and g commute, for example because f is ‘httpd must be running’ and g is ‘crond must be running’, the result of combining them is f g f g = f f g g = f g because both f and g are idempotent.

But what if they are not ? What if f is ‘make sure httpd is running’ and g is ‘make sure httpd.conf has this specific content’ ? How can we convince ourselves that combining these two actions is still idempotent ? When we look at real-life management actions, they are actually more than just idempotent: they are constant functions. No matter what state the system is in, we want the result of f to be that httpd is running. That means that f is not just idempotent, i.e. that f f = f, but that for any other management action g, we have f g = f. And if f and g are constant functions, f g = f and therefore f g f g = f f = f, which makes the combination f g idempotent, too, but is much stronger than mere idempotency.

In practice, there are of course other considerations. For example, the action ‘make sure httpd is running’ will generally fail if httpd is not even installed, but that does not really affect the argument above, we’d just have to get more technical and talk of management actions as partial functions, where concatenating them only makes sense where they are both defined. Similarly, we can work around order-dependency by getting a little more formal about what we consider the system state x and that management actions should actually be constant on the ‘interesting’ part of the state, and the identity everywhere else.

It therefore seems misleading to harp so much on idempotency when we talk about systems management. What we really want are constant functions, not just idempotent ones, a fact that the notion of ‘desired-state management’ nicely alludes to, but doesn’t make quite clear enough.

Comments

Using Puppet's policy-based autosigning

13 June 2014

Handling SSL certificates is not a lot of fun, and while Puppet’s use of client certificates protects the server and all its deep, dark secrets very well from rogue clients, it also leads to a lot of frustration. In many cases, users would configure their autosign.conf to allow any (or almost any) client’s certificate to be signed automatically, which isn’t exactly great for security. Since Puppet 3.4.0, it is possible to use policy-based autosigning to have much more control over autosigning, and to do that in a much more secure manner than the old autosigning based solely on client’s hostnames.

One of the uses for this is automatically providing certificates to instances in EC2. Chris Barker wrote a nice module, based on a gist by Jeremy Bouse that uses policy-based autosigning to provide EC2 instances with certificates, based on their instance_id.

I recently got curious, and wanted to use that same mechanism but with preshared keys. Here’s a quick step-by-step guide of what I had to do:

The autosign script

When you set autosign in puppet.conf to point at a script, Puppet will call that script every time a client request a certificate with the client’s certname as the sole command line argument of the script and the CSR on stdin. If the script exits successfully, Puppet will sign the certificate, and refuse to sign it otherwise.

On the master, we’ll maintain a directory /etc/puppet/autosign/psk; files in that directory must have the certname of the client and contain the preshared key.

Here is the autosign-psk script; the OID’s for Puppet-specific certificate extensions can be found here:

#! /bin/bash

PSK_DIR=/etc/puppet/autosign/psk

csr=$(< /dev/stdin)
certname=$1

# Get the certificate extension with OID $1 from the csr
function extension {
  echo "$csr" | openssl req -noout -text | fgrep -A1 "$1" | tail -n 1 \
      | sed -e 's/^ *//;s/ *$//'
}

psk=$(extension '1.3.6.1.4.1.34380.1.1.4')

echo "autosign $1 with PSK $psk"

psk_file=$PSK_DIR/$certname
if [ -f "$psk_file" ]; then
    if grep -q "$psk" "$psk_file"; then
        exit 0
    else
        echo "File for '$psk' does not contain '$certname'"
        exit 1
    fi
else
    echo "Could not find PSK file for $certname"
    exit 1
fi

Puppet master setup

On the Puppet master, we put the above script into /usr/local/bin/autosign-psk, make it world-executable, and point autosign at it:

cp somewhere/autosign-psk /usr/local/bin
chmod a+x /usr/local/bin/autosign-psk
mkdir -p /etc/puppet/autosign/psk
puppet config set --section master autosign /usr/local/bin/autosign-psk

A PSK for client $clientname can easily be generated with

tr -cd 'a-f0-9' < /dev/urandom | head -c 32 >/etc/puppet/autosign/psk/$certname

Puppet agent setup

On the agent, we create the file /etc/puppet/csr_attributes.yaml with the PSK in it:

---
extension_requests:
  pp_preshared_key: @the_psk@

With all that in place, we can now run the Puppet agent and have it get its certificate automatically; that process is as secure as we keep the preshared key.

Comments

Don't hate the HATEOS

20 December 2012

DHH has a post on some of the hoopla around hypermedia API’s over at SvN, complete with a cool picture of the WS-*. While I agree with most of his points, he’s missing the larger point of API discoverability.

The reason discoverability is front and center in RESTful API’s isn’t some naive belief that the semantics of the API will just magically be discovered by the client — instead, it’s a strategy to keep logic that belongs on the server out of clients. When a client is told that they have to discover the URL for posting a comment to an article, they are also told to prepare that that operation might not be available. There are lots of reasons why that operation may not be possible for the client; none of them need to interest the client, all it cares about is whether that operation is advertised in the article or not.

DHH also puts up a nice strawman, and then ceremoniously burns it to the ground:

The idea that you can write one client to access multiple different APIs in any meaningful way disregards the idea that different apps do different things.

Again, that misses the point, especially of discoverability. Not every API has exactly one deployment. Many clients need to work with multiple different deployments of the same API; the Deltacloud API is a good example of how discoverability lays down clear guidelines for clients on what they can assume, and what they have to be prepared for being different with each different endpoint they want to talk to. You can look at that as making the contract between server and client explicit in the API. Discoverability makes conditional promises to the client: if you see X, you may safely do Y.

We are all in agreement though that overall we want to tread very lightly when it comes to standardizing API mechanisms - I think there are some areas around RESTful API’s where some carefully crafted standards might help, but staying out of range of the WS-* is much more important.

Comments

CIMI v1.0 released

29 August 2012

This morning, the DMTF officially announced the availability of CIMI v1.0. After two years of hard work, heated discussions, and many a vote on proposed changes, CIMI is the best shot the fragmented, confusing, and in places legally encumbered, landscape of IaaS API’s has at a universally supported API. Not just because of the impressive number of industry players that are part of the working group but also because it has been designed from the ground up as a modular RESTful API, taking the breadth of existing IaaS API’s into account.

While the name suggests that CIMI is 75% CIM, the two have actually no relation to each other, except that they are both DMTF standards. CIMI covers most of the familiar concepts from IaaS management: instances (called machines), block storage (volumes), images, and networks. The standard itself is big, though most of the features in it are optional, and I don’t expect that any one provider will support everything mentioned in the standard. To get started, I highly recommend reading the primer first, as a gentle introduction to how CIMI views the world and how it models common IaaS concepts. The standard itself then serves as a convenient reference to fill in the details.

One of the goals of CIMI is that providers with widely varying feature sets can implement it, and it therefore puts a lot of emphasis on making what exactly a provider supports discoverable, using the well-known mechanisms that a RESTful style makes possible , and that we’ve also used in the Deltacloud API to expose as much of each backend’s features as possible. This emphasis on discoverability is one of the things that sets CIMI apart from the popular vendor-specific API’s, where the API has to be implemented in its entirety, or not at all.

We’ve been involved in the working group for the last two years, bringing our experience in designing Deltacloud to the table. We’ve also been busy adding various pieces to Deltacloud, and that implementation experience has been invaluable in the CIMI discussion. We’ll continue to improve our CIMI support, and build out what we have; in particular, we are working on

the CIMI frontend for Deltacloud; when you run deltacloudd -f cimi, you get a server that speaks CIMI, with the antrypoint at /cimi/cloudEntryPoint. You can try out the latest code at https://dev.deltacloud.org/cimi/cloudEntryPoint
the CIMI client app (in clients/cimi/ in our git repo — the app makes it both easier to experiment with the CIMI API, and serves as an example of CIMI client code.
a CIMI test suite; as part of our test suites, we are adding tests that can be run against any CIMI implementation and will eventually be a useful tool to informally qualify such implementations

As with all open source projects, we always have way more on the todo list than we actually have time to do. If you are interested in contributing to Deltacloud’s CIMI effort, have a look at our Contribute page, stop by the mailing list, or drop into our IRC channel #deltacloud on freenode.

Comments

Evolution for REST API's

03 August 2012

Like everything, REST API’s change over time. An important question is how these changes should be incorporated into your API, and how your clients should behave to survive that evolution.

The first reflex of anybody who’s thought about API’s and their evolution is to stick a version number on the API, and use that to signal to clients what capabilities this incarnation of the API has, and maybe even let clients use that to negotiate how they talk to the server. Mark has a very good post explaining why, for the Web, that is not just undesirable, but often not feasible.

If versioning is out, what else can be done to safely evolve REST API’s ? Before we dive into specific examples, it’s useful to recall what our overriding goal is. Since it is much easier to update a server than all the clients that might talk to it, the fundamental aim of careful evolution of REST API’s is:

Old clients must work against new servers

To make this maxim practical, clients need to follow the simple rule:

Ignore any unexpected data in the interaction with the server

In particular, clients can never assume that they have a complete picture of what they will find in a response from the server.

Let’s look at a little toy API to make these ideas more tangible, and to explore how this API can change while adhering to these rules. The API is for a simplistic blogging application that allows posting articles, and retrieveing them. For the sake of simplicity, I will omit all HTTP request and response headers.

A simple REST API

In sticking with good REST practice, the API has a single entrypoint at /api. Issuing a GET /api will result in the response

<api>
  <link rel="articles" href="/api/articles"/>
</api>

The articles collection can be retrieved with a GET /api/articles:

<articles>
  <article href="/api/articles/1">
    <title>Evolution for REST API's</title>
    <content>
      Like everything, ....
    </content>
  </article>
  <article href="/api/articles/2">
    ...
  </article>
  <actions>
    <link rel="create" href="/api/articles"/>
  </actions>
</articles>

Each article consists of a title and some content; the href on each article gives clients the URL from which they can retrieve that article, and serves as a unique identifier for the article.

The actions element in the articles collection tell the client that they can create new articles by issuing POST requests to /api/articles:

<article>
  <title>How to version REST API's</title>
  <content>...</content>
</article>

It’s worth pointing out a subtlety in including a link for the create action: one reason for including that link is to tell clients the URL to which they can POST to create new articles, and keep them from making assumptions about the URL space of the server. A more important reason though is that we use the presence of this link to communicate to the client that it may post new articles. This, following the HATEOS constraint for REST API’s, is the more important reason to include an explicit link: clients should not even assume that they are allowed to create new articles.

Adding information from the server

Readers might want to know when a particular article has been made available. We therefore add a published attribute to the representation of articles that a GET on the articles collection or on an individual article’s URI returns:

<article href="/api/articles/2">
  <title>How to version REST API's</title>
  <content>...</content>
  <published>2012-08-03T13:00</published>
</article>

This does not break old clients, because we told them to ignore things they do not know about. A client that only knows about the previous version of our API will still work fine, it just won’t do anything with the published element.

Allowing more data when creating an article

Some articles might be related to other resources on the web, and we’d want to let authors call them out explicitly in their articles. We therefore change the API to accept articles with some additional data on POST /api/articles:

<article>
  <title>Great REST resources</title>
  <content>...</content>
  <related>
    <link rel="background" href="http://en.wikipedia.org/wiki/Representational_state_transfer"/>
    <link rel="background" href="http://en.wikipedia.org/wiki/HATEOAS"/>
  </related>
</article>

As long as our new API allows posting of articles without any related links, old clients will continue to work.

Blogging API’s everywhere

If our blogging software is so successful that clients must be prepared to deal with both servers that support adding related reosurces, and ones that do not, we need a way to indicate that to those clients that know about related resources. While there are many ways to do that, one that we’ve found works well for Deltacloud is annotating the collections in the toplevel API entrypoint. When a client does a GET /api from a server that supports related resources, we’d send them the following XML back:

<api>
  <link rel="posts" href="/api/posts">
    <feature name="related_resources"/>
  </link>
</api>

Updating articles

Authors want to revise their articles from time to time; we’d make that possible by allowing them to PUT the updated version of an article to its URL. This won’t introduce any problems for old clients, but new clients will need to know whether the particular instance of the API they are talking to supports updating articles. We’d solve that by adding actions to the article itself, so that a GET of an article or the articles collection will return

<article href="/api/posts/42"/>
  <title>...</title>
  ...
  <actions>
    <link rel="update" href="/api/posts/42"/>
  </actions>
</article>

Not only does the update link tell clients that they are talking to a version of the blogging API that supports updates, it also lets us hide complicated business logic that decides whether an article can be updated or not by simply showing or suppressing the update link.

Merging blogs

Because of its spectacular content, our blog has been so successful that we want to turn it from a personal blog into a group blog, supporting multiple authors. That of course calls for adding the name of each author (or their avatar or whatnot) to each post — in other words, we want to make passing in an author mandatory when creating or updating an article. Rather than break old clients by silently slipping in the author requirement, we add a new action to the articles collection:

<articles>
  <post>...</post>
  <actions>
    <link rel="create_with_author" href="/api/articles_with_author"/>
    ...
  </actions>
</articles>

Old clients will ignore that new action; the remaining question is if we can still allow old clients to post new articles. If we can, for example, by defining a default author out-of-band with this API, we’d still show the old create action in the articles collection. If not, we’d take the ability to post away from old clients by not displaying the create action anymore — but we haven’t broken them, since they can still continue to retrieve posts, we’ve merely degraded them to readonly clients.

While this seems like an extreme change, consider that we’ve changed our application so much that existing clients can simply not provide the data we deem necessary for a successful post. It’s much more realistic that we’d find a way to let old clients still post articles using the old create link.

Some consequences for XML

There are two representations that are popular with REST API’s: JSON and XML. The latter poses an additional challenge for the evolution of REST API’s because the use of XML in REST API’s differs subtly from that in many other places. Since clients can never be sure that they know about everything that might be in a server’s response, it is not possible to write down a schema (or RelaxNG grammar) that the client could use to validate server responses, since responses from an updated server would violate that schema, as the simple example of adding a published date to articles above shows.

It’s of course possible to write down RelaxNG grammars for a specific version of the API, but they are tied to that specific version, and must therefore be ignored by clients who want to happily evolve with the server.

Questions ?

I’ve tried to cover all the different scenarios that one encounters when evolving a RESTful API — I’ve left out HTTP specific issues like status codes (must never change) and headers (adding new optional headers is ok) as the Openstack folks have decided for their API Change Guidelines.

I’d be very curious to hear about changes that can not be addressed by one of the mechanisms described above.

Comments

Deltacloud 1.0

15 June 2012

The upcoming release of Deltacloud 1.0 is a huge milestone for the project: even though no sausages were hurt in its making, it is still chockful of the broadest blend of the finest IaaS API ingredients. The changes and improvements are too numerous to list in detail, but it is worth highlighting some of them. TL;DR: the release candidate is available now.

EC2 frontend

With this release, Deltacloud moves another step towards being a universal cloud IaaS API proxy: when we started adding support for DMTF CIMI as an alternative to the ‘classic’ Deltacloud API, it became apparent that adding additional frontends could be done with very little efforts. The new EC2 frontend proves that this is even possible for API’s that are not RESTful. With that, Deltacloud allows clients that only know the EC2 API to talk to various backends, including OpenStack, vSphere, and oVirt.

The EC2 frontend supports the most commonly needed operations, in particular those necessary for finding an image, launching an instance off of it and managing that instance’s lifecycle. In addition, managing SSH key pairs is also supported. We hope to grow the coverage of the API in future releases to the point where the EC2 frontend is good enough to support the majority of uses.

The debate around the ‘right’ cloud IaaS API is heated and continues, especially around standards, and we still see the right answer to this debate in a properly standardized, broadly supported, and openly governed API such as DMTF’s CIMI — yet it is undeniable that EC2 is the front runner in this space, and that large investments into EC2’s API exist; it is Deltacloud’s mission to alleviate the resulting lockin, and the addition of the EC2 frontend allows users to experiment with different backend technologies while migrating off the EC2 API on their own pace.

One issue that the EC2 frontend brings to the forefront is just how unsuitable that API is for fronting different backend implementations: IaaS API’s that are designed for this purpose provide extensive capabilities for client discovery of various features. EC2 on the other hand provides no way for providers to advertise their deviation from EC2’s feature set, and no possibilities for clients to discover them.

CIMI frontend

We continue our quest to support the fledgling CIMI standard as broadly and as faithfully as possible. With this release, we introduce support for the CIMI networking API; for now only for the Mock driver, but we are looking to expand backend support for networking as clouds add the needed features for them.

Besides the core CIMI API, which is purely a RESTful XML and JSON API, work also continues on the simple human-consumable HTML interface for it; we’ve learned from designing the Deltacloud API and helping others using that API, that a web application that stays close to the API, but is easy to use for humans is an invaluable tool. With this release, that application can now talk to OpenStack, RHEV-M/oVirt, and EC2 via Deltacloud’s CIMI proxy.

Operational and code enhancements

With three frontends, it’s become even more urgent that the three frontends can be run from the same server instance to reduce the number of daemons that need to be babysat. Thanks to an extensive revamp of the guts of Deltacloud to turn it into a modular Sinatra app it is now possible to expose all three frontends (or only one or two) from the same server.

We now also base our RESTful routes and controllers on sinatra-rabbit — only fitting since sinatra-rabbit started life as the DSL we used inside Deltacloud for our RESTful routing and our controllers.

A lot of work has gone into rationalizing the HTTP status codes that Deltacloud returns, especially when errors occur; in the process, we learned quite a bit about just how fickle and moody vSphere can be.

Other drivers have seen major updates, not least of which the OpenStack driver, which now works against the OpenStack v2.0 API; in particular, it works against the HP cloud — with the EC2 frontend, Deltacloud provides a capable EC2 proxy for OpenStack. We’ve also added a driver for the Fujitsu Global Cloud Platform, which was mostly written by Dies Köper of Fujitsu.

The release candidate for version 1.0.0 is out now, packages for rubygems.org, Fedora and other distributions will appear as soon as the release has passed the vote on the mailing list.

Comments

Sinatra Rabbit - a RESTful DSL

13 March 2012

TL;DR: have a look at sinatra-rabbit.

When we converted Deltacloud from Rails to Sinatra, we needed a way to conveniently write the controller logic for RESTful routes with Sinatra. On a lark, I cooked up a DSL called ‘Rabbit’ that lets you write things like

collection :images do
  description "The collection of images"

  operation :index do
    description "List all images"
    param :id,            :string
    param :architecture,  :string,  :optional
    control { ... controller code ... }
  end

  operation :show do
    description 'Show an image identified by "id" parameter.'
    param :id,           :string, :required
    control { ... show image params[:id] ... }
  end

  operation :destroy do
    description "Remove specified"
    param :id,    :string,    :required
    control do
      driver.destroy_image(credentials, params[:id])
      status 204
      respond_to do |format|
        format.xml
        format.json
        format.html { redirect(images_url) }
      end
    end
  end

end

That makes supporting the common REST operations convenient, and allows us to auto-generate documentation for the REST API. It has been very useful in writing the two frontends for Deltacloud.

The DSL has lots of features, for example, validation of input parameters, conditionally allowing additional parameters, describing subcollections, autogenerating HEAD and OPTIONS routes and controllers, and many more.

Michal Fojtik has pulled that code out of Deltacloud and extracted it into its own github project as sinatra-rabbit In the process, there were quite a few dragones to slay: for example, in Deltacloud we change what parameters some operations can accept based on the specific backend driver. For example, in some clouds, it is possible to inject user-defined data into instances upon launch. In Deltacloud, the logic of what routes to turn on or off is based on introspecting the current driver, which means that Deltacloud’s Rabbit knows about drivers. That, of course, has to be changed for the standalone sinatra-rabbit. Michal just added route conditions that look like

collection :images do
    operation :create, :if => lambda { complicated_condition(request) } do
        ...
    end
end

Hopefully, sinatra-rabbit will grow to the point where we can remove our bundled implementation from Deltacloud, and use the standalone version; there’s still a couple of features missing, but with enough people sending patches, it can’t be very long now ;)

Comments

Watzmann.Blog Varying amounts of fiber

Entirely too many words about idempotency in systems management

Using Puppet's policy-based autosigning

The autosign script

Puppet master setup

Puppet agent setup

Don't hate the HATEOS

CIMI v1.0 released

Evolution for REST API's

A simple REST API

Adding information from the server

Allowing more data when creating an article

Blogging API’s everywhere

Updating articles

Merging blogs

Some consequences for XML

Questions ?

Deltacloud 1.0

EC2 frontend

CIMI frontend

Operational and code enhancements

Sinatra Rabbit - a RESTful DSL