Idempotency is one of the fundamental ideas when managing systems: it’s both convenient and natural to demand that any management action has the same result whether it’s performed once or multiple times. For example, if the management action is ‘make sure that httpd is running’, we only want to say that once, and if, for some reason, that action gets performed multiple times, the result should stay the same. In this post, I’ll use ‘classical’ management actions on long-lived, stateful servers as examples, but the arguments apply in the same way to management actions that manipulate cloud services or kubernetes clusters, or really any other system that you’d want to manage.
It has always bothered me that it’s not obvious that stringing such idempotent actions together will always be idempotent, too.
Formally an action is a function f
that turns some system state x
into
an updated state f(x)
. Idempotent then means that f(f(x)) = f(x)
, which
I’ll also write as f f = f
, dropping the argument x
to f
. For two
actions f
and g
to be idempotent when we string them together then
means that f g f g = f g
. Clearly, if f
and g
commute, for example
because f
is ‘httpd must be running’ and g
is ‘crond must be running’,
the result of combining them is f g f g = f f g g = f g
because both f
and g
are idempotent.
But what if they are not ? What if f
is ‘make sure httpd is running’ and
g
is ‘make sure httpd.conf
has this specific content’ ? How can we
convince ourselves that combining these two actions is still idempotent ?
When we look at real-life management actions, they are actually more than
just idempotent: they are constant functions. No matter what state the
system is in, we want the result of f
to be that httpd
is running. That
means that f
is not just idempotent, i.e. that f f = f
, but that for
any other management action g
, we have f g = f
. And if f
and g
are
constant functions, f g = f
and therefore f g f g = f f = f
, which
makes the combination f g
idempotent, too, but is much stronger than mere
idempotency.
In practice, there are of course other considerations. For example, the
action ‘make sure httpd is running’ will generally fail if httpd
is not
even installed, but that does not really affect the argument above, we’d
just have to get more technical and talk of management actions as partial
functions, where concatenating them only makes sense where they are both
defined. Similarly, we can work around order-dependency by getting a little
more formal about what we consider the system state x
and that management
actions should actually be constant on the ‘interesting’ part of the state,
and the identity everywhere else.
It therefore seems misleading to harp so much on idempotency when we talk about systems management. What we really want are constant functions, not just idempotent ones, a fact that the notion of ‘desired-state management’ nicely alludes to, but doesn’t make quite clear enough.
Handling SSL certificates is not a lot of fun, and while Puppet’s use of
client certificates protects the server and all its deep, dark secrets very
well from rogue clients, it also leads to a lot of frustration. In many
cases, users would configure their autosign.conf
to allow any (or almost
any) client’s certificate to be signed automatically, which isn’t exactly
great for security. Since Puppet 3.4.0, it is possible to use
policy-based autosigning
to have much more control over autosigning, and to do that in a much more
secure manner than the old autosigning based solely on client’s hostnames.
One of the uses for this is automatically providing certificates to
instances in EC2. Chris Barker
wrote a nice module,
based on a gist by
Jeremy Bouse that uses policy-based
autosigning to provide EC2 instances with certificates, based on their
instance_id
.
I recently got curious, and wanted to use that same mechanism but with preshared keys. Here’s a quick step-by-step guide of what I had to do:
When you set autosign
in puppet.conf
to point at a script, Puppet will
call that script every time a client request a certificate with the
client’s certname as the sole command line argument of the script and the
CSR on stdin. If the script exits successfully, Puppet will sign the
certificate, and refuse to sign it otherwise.
On the master, we’ll maintain a directory /etc/puppet/autosign/psk
; files
in that directory must have the certname of the client and contain the
preshared key.
Here is the autosign-psk
script; the OID’s for Puppet-specific
certificate extensions can be found
here:
#! /bin/bash
PSK_DIR=/etc/puppet/autosign/psk
csr=$(< /dev/stdin)
certname=$1
# Get the certificate extension with OID $1 from the csr
function extension {
echo "$csr" | openssl req -noout -text | fgrep -A1 "$1" | tail -n 1 \
| sed -e 's/^ *//;s/ *$//'
}
psk=$(extension '1.3.6.1.4.1.34380.1.1.4')
echo "autosign $1 with PSK $psk"
psk_file=$PSK_DIR/$certname
if [ -f "$psk_file" ]; then
if grep -q "$psk" "$psk_file"; then
exit 0
else
echo "File for '$psk' does not contain '$certname'"
exit 1
fi
else
echo "Could not find PSK file for $certname"
exit 1
fi
On the Puppet master, we put the above script into
/usr/local/bin/autosign-psk
, make it world-executable, and point
autosign
at it:
cp somewhere/autosign-psk /usr/local/bin
chmod a+x /usr/local/bin/autosign-psk
mkdir -p /etc/puppet/autosign/psk
puppet config set --section master autosign /usr/local/bin/autosign-psk
A PSK for client $clientname
can easily be generated with
tr -cd 'a-f0-9' < /dev/urandom | head -c 32 >/etc/puppet/autosign/psk/$certname
On the agent, we create the file /etc/puppet/csr_attributes.yaml
with the
PSK in it:
---
extension_requests:
pp_preshared_key: @the_psk@
With all that in place, we can now run the Puppet agent and have it get its certificate automatically; that process is as secure as we keep the preshared key.
DHH has a post on some of the hoopla around hypermedia API’s over at SvN, complete with a cool picture of the WS-*. While I agree with most of his points, he’s missing the larger point of API discoverability.
The reason discoverability is front and center in RESTful API’s isn’t some naive belief that the semantics of the API will just magically be discovered by the client — instead, it’s a strategy to keep logic that belongs on the server out of clients. When a client is told that they have to discover the URL for posting a comment to an article, they are also told to prepare that that operation might not be available. There are lots of reasons why that operation may not be possible for the client; none of them need to interest the client, all it cares about is whether that operation is advertised in the article or not.
DHH also puts up a nice strawman, and then ceremoniously burns it to the ground:
The idea that you can write one client to access multiple different APIs in any meaningful way disregards the idea that different apps do different things.
Again, that misses the point, especially of discoverability. Not every API has exactly one deployment. Many clients need to work with multiple different deployments of the same API; the Deltacloud API is a good example of how discoverability lays down clear guidelines for clients on what they can assume, and what they have to be prepared for being different with each different endpoint they want to talk to. You can look at that as making the contract between server and client explicit in the API. Discoverability makes conditional promises to the client: if you see X, you may safely do Y.
We are all in agreement though that overall we want to tread very lightly when it comes to standardizing API mechanisms - I think there are some areas around RESTful API’s where some carefully crafted standards might help, but staying out of range of the WS-* is much more important.
This morning, the DMTF officially announced the availability of CIMI v1.0. After two years of hard work, heated discussions, and many a vote on proposed changes, CIMI is the best shot the fragmented, confusing, and in places legally encumbered, landscape of IaaS API’s has at a universally supported API. Not just because of the impressive number of industry players that are part of the working group but also because it has been designed from the ground up as a modular RESTful API, taking the breadth of existing IaaS API’s into account.
While the name suggests that CIMI is 75% CIM, the two have actually no relation to each other, except that they are both DMTF standards. CIMI covers most of the familiar concepts from IaaS management: instances (called machines), block storage (volumes), images, and networks. The standard itself is big, though most of the features in it are optional, and I don’t expect that any one provider will support everything mentioned in the standard. To get started, I highly recommend reading the primer first, as a gentle introduction to how CIMI views the world and how it models common IaaS concepts. The standard itself then serves as a convenient reference to fill in the details.
One of the goals of CIMI is that providers with widely varying feature sets can implement it, and it therefore puts a lot of emphasis on making what exactly a provider supports discoverable, using the well-known mechanisms that a RESTful style makes possible , and that we’ve also used in the Deltacloud API to expose as much of each backend’s features as possible. This emphasis on discoverability is one of the things that sets CIMI apart from the popular vendor-specific API’s, where the API has to be implemented in its entirety, or not at all.
We’ve been involved in the working group for the last two years, bringing our experience in designing Deltacloud to the table. We’ve also been busy adding various pieces to Deltacloud, and that implementation experience has been invaluable in the CIMI discussion. We’ll continue to improve our CIMI support, and build out what we have; in particular, we are working on
deltacloudd -f cimi
, you
get a server that speaks CIMI, with the antrypoint at
/cimi/cloudEntryPoint
. You can try out the latest code at
https://dev.deltacloud.org/cimi/cloudEntryPoint
clients/cimi/
in our
git repo — the app makes it
both easier to experiment with the CIMI API, and serves as an example of
CIMI client code.As with all open source projects, we always have way more on the todo list
than we actually have time to do. If you are interested in contributing to
Deltacloud’s CIMI effort, have a look at
our Contribute page,
stop by the mailing list, or
drop into our IRC channel #deltacloud
on freenode.
Like everything, REST API’s change over time. An important question is how these changes should be incorporated into your API, and how your clients should behave to survive that evolution.
The first reflex of anybody who’s thought about API’s and their evolution is to stick a version number on the API, and use that to signal to clients what capabilities this incarnation of the API has, and maybe even let clients use that to negotiate how they talk to the server. Mark has a very good post explaining why, for the Web, that is not just undesirable, but often not feasible.
If versioning is out, what else can be done to safely evolve REST API’s ? Before we dive into specific examples, it’s useful to recall what our overriding goal is. Since it is much easier to update a server than all the clients that might talk to it, the fundamental aim of careful evolution of REST API’s is:
Old clients must work against new servers
To make this maxim practical, clients need to follow the simple rule:
Ignore any unexpected data in the interaction with the server
In particular, clients can never assume that they have a complete picture of what they will find in a response from the server.
Let’s look at a little toy API to make these ideas more tangible, and to explore how this API can change while adhering to these rules. The API is for a simplistic blogging application that allows posting articles, and retrieveing them. For the sake of simplicity, I will omit all HTTP request and response headers.
In sticking with good
REST practice,
the API has a single entrypoint at /api
. Issuing a GET /api
will result
in the response
<api>
<link rel="articles" href="/api/articles"/>
</api>
The articles collection can be retrieved with a GET /api/articles
:
<articles>
<article href="/api/articles/1">
<title>Evolution for REST API's</title>
<content>
Like everything, ....
</content>
</article>
<article href="/api/articles/2">
...
</article>
<actions>
<link rel="create" href="/api/articles"/>
</actions>
</articles>
Each article consists of a title and some content; the href
on each article
gives clients the URL from which they can retrieve that article, and serves as
a unique identifier for the article.
The actions element in the articles collection tell the client that they can
create new articles by issuing POST
requests to /api/articles
:
<article>
<title>How to version REST API's</title>
<content>...</content>
</article>
It’s worth pointing out a subtlety in including a link for the create
action: one reason for including that link is to tell clients the URL to
which they can POST
to create new articles, and keep them from making
assumptions about the URL space of the server. A more important reason
though is that we use the presence of this link to communicate to the
client that it may post new articles. This, following the
HATEOS constraint for REST API’s,
is the more important reason to include an explicit link: clients should
not even assume that they are allowed to create new articles.
Readers might want to know when a particular article has been made
available. We therefore add a published
attribute to the representation
of articles that a GET
on the articles collection or on an individual
article’s URI returns:
<article href="/api/articles/2">
<title>How to version REST API's</title>
<content>...</content>
<published>2012-08-03T13:00</published>
</article>
This does not break old clients, because we told them to ignore things they
do not know about. A client that only knows about the previous version of
our API will still work fine, it just won’t do anything with the
published
element.
Some articles might be related to other resources on the web, and we’d want
to let authors call them out explicitly in their articles. We therefore change
the API to accept articles with some additional data on POST
/api/articles
:
<article>
<title>Great REST resources</title>
<content>...</content>
<related>
<link rel="background" href="http://en.wikipedia.org/wiki/Representational_state_transfer"/>
<link rel="background" href="http://en.wikipedia.org/wiki/HATEOAS"/>
</related>
</article>
As long as our new API allows posting of articles without any related
links, old clients will continue to work.
If our blogging software is so successful that clients must be prepared to
deal with both servers that support adding related reosurces, and ones that
do not, we need a way to indicate that to those clients that know about
related resources. While there are many ways to do that, one that we’ve
found works well for Deltacloud is annotating the
collections in the toplevel API entrypoint. When a client does a GET /api
from a server that supports related resources, we’d send them the following
XML back:
<api>
<link rel="posts" href="/api/posts">
<feature name="related_resources"/>
</link>
</api>
Authors want to revise their articles from time to time; we’d make that
possible by allowing them to PUT
the updated version of an article to its
URL. This won’t introduce any problems for old clients, but new clients
will need to know whether the particular instance of the API they are
talking to supports updating articles. We’d solve that by adding actions
to the article itself, so that a GET
of an article or the articles
collection will return
<article href="/api/posts/42"/>
<title>...</title>
...
<actions>
<link rel="update" href="/api/posts/42"/>
</actions>
</article>
Not only does the update
link tell clients that they are talking to a
version of the blogging API that supports updates, it also lets us hide
complicated business logic that decides whether an article can be updated
or not by simply showing or suppressing the update
link.
Because of its spectacular content, our blog has been so successful that we want to turn it from a personal blog into a group blog, supporting multiple authors. That of course calls for adding the name of each author (or their avatar or whatnot) to each post — in other words, we want to make passing in an author mandatory when creating or updating an article. Rather than break old clients by silently slipping in the author requirement, we add a new action to the articles collection:
<articles>
<post>...</post>
<actions>
<link rel="create_with_author" href="/api/articles_with_author"/>
...
</actions>
</articles>
Old clients will ignore that new action; the remaining question is if we
can still allow old clients to post new articles. If we can, for example,
by defining a default author out-of-band with this API, we’d still show the
old create
action in the articles collection. If not, we’d take the
ability to post away from old clients by not displaying the create
action
anymore — but we haven’t broken them, since they can still continue
to retrieve posts, we’ve merely degraded them to readonly clients.
While this seems like an extreme change, consider that we’ve changed our
application so much that existing clients can simply not provide the data
we deem necessary for a successful post. It’s much more realistic that we’d
find a way to let old clients still post articles using the old create
link.
There are two representations that are popular with REST API’s: JSON and
XML. The latter poses an additional challenge for the evolution of REST
API’s because the use of XML in REST API’s differs subtly from that in many
other places. Since clients can never be sure that they know about
everything that might be in a server’s response, it is not possible to
write down a schema (or
RelaxNG grammar) that the client could use to
validate server responses, since responses from an updated server would
violate that schema, as the simple example of adding a published
date to
articles above shows.
It’s of course possible to write down RelaxNG grammars for a specific version of the API, but they are tied to that specific version, and must therefore be ignored by clients who want to happily evolve with the server.
I’ve tried to cover all the different scenarios that one encounters when evolving a RESTful API — I’ve left out HTTP specific issues like status codes (must never change) and headers (adding new optional headers is ok) as the Openstack folks have decided for their API Change Guidelines.
I’d be very curious to hear about changes that can not be addressed by one of the mechanisms described above.
The upcoming release of Deltacloud 1.0 is a huge milestone for the project: even though no sausages were hurt in its making, it is still chockful of the broadest blend of the finest IaaS API ingredients. The changes and improvements are too numerous to list in detail, but it is worth highlighting some of them. TL;DR: the release candidate is available now.
With this release, Deltacloud moves another step towards being a universal cloud IaaS API proxy: when we started adding support for DMTF CIMI as an alternative to the ‘classic’ Deltacloud API, it became apparent that adding additional frontends could be done with very little efforts. The new EC2 frontend proves that this is even possible for API’s that are not RESTful. With that, Deltacloud allows clients that only know the EC2 API to talk to various backends, including OpenStack, vSphere, and oVirt.
The EC2 frontend supports the most commonly needed operations, in particular those necessary for finding an image, launching an instance off of it and managing that instance’s lifecycle. In addition, managing SSH key pairs is also supported. We hope to grow the coverage of the API in future releases to the point where the EC2 frontend is good enough to support the majority of uses.
The debate around the ‘right’ cloud IaaS API is heated and continues, especially around standards, and we still see the right answer to this debate in a properly standardized, broadly supported, and openly governed API such as DMTF’s CIMI — yet it is undeniable that EC2 is the front runner in this space, and that large investments into EC2’s API exist; it is Deltacloud’s mission to alleviate the resulting lockin, and the addition of the EC2 frontend allows users to experiment with different backend technologies while migrating off the EC2 API on their own pace.
One issue that the EC2 frontend brings to the forefront is just how unsuitable that API is for fronting different backend implementations: IaaS API’s that are designed for this purpose provide extensive capabilities for client discovery of various features. EC2 on the other hand provides no way for providers to advertise their deviation from EC2’s feature set, and no possibilities for clients to discover them.
We continue our quest to support the fledgling CIMI standard as broadly and as faithfully as possible. With this release, we introduce support for the CIMI networking API; for now only for the Mock driver, but we are looking to expand backend support for networking as clouds add the needed features for them.
Besides the core CIMI API, which is purely a RESTful XML and JSON API, work also continues on the simple human-consumable HTML interface for it; we’ve learned from designing the Deltacloud API and helping others using that API, that a web application that stays close to the API, but is easy to use for humans is an invaluable tool. With this release, that application can now talk to OpenStack, RHEV-M/oVirt, and EC2 via Deltacloud’s CIMI proxy.
With three frontends, it’s become even more urgent that the three frontends can be run from the same server instance to reduce the number of daemons that need to be babysat. Thanks to an extensive revamp of the guts of Deltacloud to turn it into a modular Sinatra app it is now possible to expose all three frontends (or only one or two) from the same server.
We now also base our RESTful routes and controllers on sinatra-rabbit — only fitting since sinatra-rabbit started life as the DSL we used inside Deltacloud for our RESTful routing and our controllers.
A lot of work has gone into rationalizing the HTTP status codes that Deltacloud returns, especially when errors occur; in the process, we learned quite a bit about just how fickle and moody vSphere can be.
Other drivers have seen major updates, not least of which the OpenStack driver, which now works against the OpenStack v2.0 API; in particular, it works against the HP cloud — with the EC2 frontend, Deltacloud provides a capable EC2 proxy for OpenStack. We’ve also added a driver for the Fujitsu Global Cloud Platform, which was mostly written by Dies Köper of Fujitsu.
The release candidate for version 1.0.0 is out now, packages for rubygems.org, Fedora and other distributions will appear as soon as the release has passed the vote on the mailing list.
TL;DR: have a look at sinatra-rabbit.
When we converted Deltacloud from Rails to Sinatra, we needed a way to conveniently write the controller logic for RESTful routes with Sinatra. On a lark, I cooked up a DSL called ‘Rabbit’ that lets you write things like
collection :images do
description "The collection of images"
operation :index do
description "List all images"
param :id, :string
param :architecture, :string, :optional
control { ... controller code ... }
end
operation :show do
description 'Show an image identified by "id" parameter.'
param :id, :string, :required
control { ... show image params[:id] ... }
end
operation :destroy do
description "Remove specified"
param :id, :string, :required
control do
driver.destroy_image(credentials, params[:id])
status 204
respond_to do |format|
format.xml
format.json
format.html { redirect(images_url) }
end
end
end
end
That makes supporting the common REST operations convenient, and allows us to auto-generate documentation for the REST API. It has been very useful in writing the two frontends for Deltacloud.
The DSL has lots of features, for example, validation of input parameters,
conditionally allowing additional parameters, describing subcollections,
autogenerating HEAD
and OPTIONS
routes and controllers, and many more.
Michal Fojtik has pulled that code out of Deltacloud and
extracted it into its own github project as
sinatra-rabbit In the process,
there were quite a few dragones to slay: for example, in Deltacloud we
change what parameters some operations can accept based on the specific
backend driver. For example, in some clouds, it is possible to inject
user-defined data into instances upon launch. In Deltacloud, the logic of
what routes to turn on or off is based on introspecting the current driver,
which means that Deltacloud’s Rabbit knows about drivers. That, of course,
has to be changed for the standalone sinatra-rabbit
. Michal just added
route conditions that look like
collection :images do
operation :create, :if => lambda { complicated_condition(request) } do
...
end
end
Hopefully, sinatra-rabbit
will grow to the point where we can remove our
bundled implementation from Deltacloud, and use the standalone version;
there’s still a couple of features missing, but with enough people
sending patches, it can’t be very long now ;)
Watzmann.Blog by David Lutterkort is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.
Generated with Jekyll