Watzmann.Blog

Using Puppet's policy-based autosigning

2014-06-13T00:00:00-07:00

Handling SSL certificates is not a lot of fun, and while Puppet’s use of client certificates protects the server and all its deep, dark secrets very well from rogue clients, it also leads to a lot of frustration. In many cases, users would configure their autosign.conf to allow any (or almost any) client’s certificate to be signed automatically, which isn’t exactly great for security. Since Puppet 3.4.0, it is possible to use policy-based autosigning to have much more control over autosigning, and to do that in a much more secure manner than the old autosigning based solely on client’s hostnames.

One of the uses for this is automatically providing certificates to instances in EC2. Chris Barker wrote a nice module, based on a gist by Jeremy Bouse that uses policy-based autosigning to provide EC2 instances with certificates, based on their instance_id.

I recently got curious, and wanted to use that same mechanism but with preshared keys. Here’s a quick step-by-step guide of what I had to do:

The autosign script

When you set autosign in puppet.conf to point at a script, Puppet will call that script every time a client request a certificate with the client’s certname as the sole command line argument of the script and the CSR on stdin. If the script exits successfully, Puppet will sign the certificate, and refuse to sign it otherwise.

On the master, we’ll maintain a directory /etc/puppet/autosign/psk; files in that directory must have the certname of the client and contain the preshared key.

Here is the autosign-psk script; the OID’s for Puppet-specific certificate extensions can be found here:

#! /bin/bash

PSK_DIR=/etc/puppet/autosign/psk

csr=$(< /dev/stdin)
certname=$1

# Get the certificate extension with OID $1 from the csr
function extension {
  echo "$csr" | openssl req -noout -text | fgrep -A1 "$1" | tail -n 1 \
      | sed -e 's/^ *//;s/ *$//'
}

psk=$(extension '1.3.6.1.4.1.34380.1.1.4')

echo "autosign $1 with PSK $psk"

psk_file=$PSK_DIR/$certname
if [ -f "$psk_file" ]; then
    if grep -q "$psk" "$psk_file"; then
        exit 0
    else
        echo "File for '$psk' does not contain '$certname'"
        exit 1
    fi
else
    echo "Could not find PSK file for $certname"
    exit 1
fi

Puppet master setup

On the Puppet master, we put the above script into /usr/local/bin/autosign-psk, make it world-executable, and point autosign at it:

cp somewhere/autosign-psk /usr/local/bin
chmod a+x /usr/local/bin/autosign-psk
mkdir -p /etc/puppet/autosign/psk
puppet config set --section master autosign /usr/local/bin/autosign-psk

A PSK for client $clientname can easily be generated with

tr -cd 'a-f0-9' < /dev/urandom | head -c 32 >/etc/puppet/autosign/psk/$certname

Puppet agent setup

On the agent, we create the file /etc/puppet/csr_attributes.yaml with the PSK in it:

---
extension_requests:
  pp_preshared_key: @the_psk@

With all that in place, we can now run the Puppet agent and have it get its certificate automatically; that process is as secure as we keep the preshared key.

Don't hate the HATEOS

2012-12-20T00:00:00-08:00

DHH has a post on some of the hoopla around hypermedia API’s over at SvN, complete with a cool picture of the WS-*. While I agree with most of his points, he’s missing the larger point of API discoverability.

The reason discoverability is front and center in RESTful API’s isn’t some naive belief that the semantics of the API will just magically be discovered by the client — instead, it’s a strategy to keep logic that belongs on the server out of clients. When a client is told that they have to discover the URL for posting a comment to an article, they are also told to prepare that that operation might not be available. There are lots of reasons why that operation may not be possible for the client; none of them need to interest the client, all it cares about is whether that operation is advertised in the article or not.

DHH also puts up a nice strawman, and then ceremoniously burns it to the ground:

The idea that you can write one client to access multiple different APIs in any meaningful way disregards the idea that different apps do different things.

Again, that misses the point, especially of discoverability. Not every API has exactly one deployment. Many clients need to work with multiple different deployments of the same API; the Deltacloud API is a good example of how discoverability lays down clear guidelines for clients on what they can assume, and what they have to be prepared for being different with each different endpoint they want to talk to. You can look at that as making the contract between server and client explicit in the API. Discoverability makes conditional promises to the client: if you see X, you may safely do Y.

We are all in agreement though that overall we want to tread very lightly when it comes to standardizing API mechanisms - I think there are some areas around RESTful API’s where some carefully crafted standards might help, but staying out of range of the WS-* is much more important.

CIMI v1.0 released

2012-08-29T00:00:00-07:00

This morning, the DMTF officially announced the availability of CIMI v1.0. After two years of hard work, heated discussions, and many a vote on proposed changes, CIMI is the best shot the fragmented, confusing, and in places legally encumbered, landscape of IaaS API’s has at a universally supported API. Not just because of the impressive number of industry players that are part of the working group but also because it has been designed from the ground up as a modular RESTful API, taking the breadth of existing IaaS API’s into account.

While the name suggests that CIMI is 75% CIM, the two have actually no relation to each other, except that they are both DMTF standards. CIMI covers most of the familiar concepts from IaaS management: instances (called machines), block storage (volumes), images, and networks. The standard itself is big, though most of the features in it are optional, and I don’t expect that any one provider will support everything mentioned in the standard. To get started, I highly recommend reading the primer first, as a gentle introduction to how CIMI views the world and how it models common IaaS concepts. The standard itself then serves as a convenient reference to fill in the details.

One of the goals of CIMI is that providers with widely varying feature sets can implement it, and it therefore puts a lot of emphasis on making what exactly a provider supports discoverable, using the well-known mechanisms that a RESTful style makes possible , and that we’ve also used in the Deltacloud API to expose as much of each backend’s features as possible. This emphasis on discoverability is one of the things that sets CIMI apart from the popular vendor-specific API’s, where the API has to be implemented in its entirety, or not at all.

We’ve been involved in the working group for the last two years, bringing our experience in designing Deltacloud to the table. We’ve also been busy adding various pieces to Deltacloud, and that implementation experience has been invaluable in the CIMI discussion. We’ll continue to improve our CIMI support, and build out what we have; in particular, we are working on

the CIMI frontend for Deltacloud; when you run deltacloudd -f cimi, you get a server that speaks CIMI, with the antrypoint at /cimi/cloudEntryPoint. You can try out the latest code at https://dev.deltacloud.org/cimi/cloudEntryPoint
the CIMI client app (in clients/cimi/ in our git repo — the app makes it both easier to experiment with the CIMI API, and serves as an example of CIMI client code.
a CIMI test suite; as part of our test suites, we are adding tests that can be run against any CIMI implementation and will eventually be a useful tool to informally qualify such implementations

As with all open source projects, we always have way more on the todo list than we actually have time to do. If you are interested in contributing to Deltacloud’s CIMI effort, have a look at our Contribute page, stop by the mailing list, or drop into our IRC channel #deltacloud on freenode.

Evolution for REST API's

2012-08-03T00:00:00-07:00

Like everything, REST API’s change over time. An important question is how these changes should be incorporated into your API, and how your clients should behave to survive that evolution.

The first reflex of anybody who’s thought about API’s and their evolution is to stick a version number on the API, and use that to signal to clients what capabilities this incarnation of the API has, and maybe even let clients use that to negotiate how they talk to the server. Mark has a very good post explaining why, for the Web, that is not just undesirable, but often not feasible.

If versioning is out, what else can be done to safely evolve REST API’s ? Before we dive into specific examples, it’s useful to recall what our overriding goal is. Since it is much easier to update a server than all the clients that might talk to it, the fundamental aim of careful evolution of REST API’s is:

Old clients must work against new servers

To make this maxim practical, clients need to follow the simple rule:

Ignore any unexpected data in the interaction with the server

In particular, clients can never assume that they have a complete picture of what they will find in a response from the server.

Let’s look at a little toy API to make these ideas more tangible, and to explore how this API can change while adhering to these rules. The API is for a simplistic blogging application that allows posting articles, and retrieveing them. For the sake of simplicity, I will omit all HTTP request and response headers.

A simple REST API

In sticking with good REST practice, the API has a single entrypoint at /api. Issuing a GET /api will result in the response

<api>
  <link rel="articles" href="/api/articles"/>
</api>

The articles collection can be retrieved with a GET /api/articles:

<articles>
  <article href="/api/articles/1">
    <title>Evolution for REST API's</title>
    <content>
      Like everything, ....
    </content>
  </article>
  <article href="/api/articles/2">
    ...
  </article>
  <actions>
    <link rel="create" href="/api/articles"/>
  </actions>
</articles>

Each article consists of a title and some content; the href on each article gives clients the URL from which they can retrieve that article, and serves as a unique identifier for the article.

The actions element in the articles collection tell the client that they can create new articles by issuing POST requests to /api/articles:

<article>
  <title>How to version REST API's</title>
  <content>...</content>
</article>

It’s worth pointing out a subtlety in including a link for the create action: one reason for including that link is to tell clients the URL to which they can POST to create new articles, and keep them from making assumptions about the URL space of the server. A more important reason though is that we use the presence of this link to communicate to the client that it may post new articles. This, following the HATEOS constraint for REST API’s, is the more important reason to include an explicit link: clients should not even assume that they are allowed to create new articles.

Adding information from the server

Readers might want to know when a particular article has been made available. We therefore add a published attribute to the representation of articles that a GET on the articles collection or on an individual article’s URI returns:

<article href="/api/articles/2">
  <title>How to version REST API's</title>
  <content>...</content>
  <published>2012-08-03T13:00</published>
</article>

This does not break old clients, because we told them to ignore things they do not know about. A client that only knows about the previous version of our API will still work fine, it just won’t do anything with the published element.

Allowing more data when creating an article

Some articles might be related to other resources on the web, and we’d want to let authors call them out explicitly in their articles. We therefore change the API to accept articles with some additional data on POST /api/articles:

<article>
  <title>Great REST resources</title>
  <content>...</content>
  <related>
    <link rel="background" href="http://en.wikipedia.org/wiki/Representational_state_transfer"/>
    <link rel="background" href="http://en.wikipedia.org/wiki/HATEOAS"/>
  </related>
</article>

As long as our new API allows posting of articles without any related links, old clients will continue to work.

Blogging API’s everywhere

If our blogging software is so successful that clients must be prepared to deal with both servers that support adding related reosurces, and ones that do not, we need a way to indicate that to those clients that know about related resources. While there are many ways to do that, one that we’ve found works well for Deltacloud is annotating the collections in the toplevel API entrypoint. When a client does a GET /api from a server that supports related resources, we’d send them the following XML back:

<api>
  <link rel="posts" href="/api/posts">
    <feature name="related_resources"/>
  </link>
</api>

Updating articles

Authors want to revise their articles from time to time; we’d make that possible by allowing them to PUT the updated version of an article to its URL. This won’t introduce any problems for old clients, but new clients will need to know whether the particular instance of the API they are talking to supports updating articles. We’d solve that by adding actions to the article itself, so that a GET of an article or the articles collection will return

<article href="/api/posts/42"/>
  <title>...</title>
  ...
  <actions>
    <link rel="update" href="/api/posts/42"/>
  </actions>
</article>

Not only does the update link tell clients that they are talking to a version of the blogging API that supports updates, it also lets us hide complicated business logic that decides whether an article can be updated or not by simply showing or suppressing the update link.

Merging blogs

Because of its spectacular content, our blog has been so successful that we want to turn it from a personal blog into a group blog, supporting multiple authors. That of course calls for adding the name of each author (or their avatar or whatnot) to each post — in other words, we want to make passing in an author mandatory when creating or updating an article. Rather than break old clients by silently slipping in the author requirement, we add a new action to the articles collection:

<articles>
  <post>...</post>
  <actions>
    <link rel="create_with_author" href="/api/articles_with_author"/>
    ...
  </actions>
</articles>

Old clients will ignore that new action; the remaining question is if we can still allow old clients to post new articles. If we can, for example, by defining a default author out-of-band with this API, we’d still show the old create action in the articles collection. If not, we’d take the ability to post away from old clients by not displaying the create action anymore — but we haven’t broken them, since they can still continue to retrieve posts, we’ve merely degraded them to readonly clients.

While this seems like an extreme change, consider that we’ve changed our application so much that existing clients can simply not provide the data we deem necessary for a successful post. It’s much more realistic that we’d find a way to let old clients still post articles using the old create link.

Some consequences for XML

There are two representations that are popular with REST API’s: JSON and XML. The latter poses an additional challenge for the evolution of REST API’s because the use of XML in REST API’s differs subtly from that in many other places. Since clients can never be sure that they know about everything that might be in a server’s response, it is not possible to write down a schema (or RelaxNG grammar) that the client could use to validate server responses, since responses from an updated server would violate that schema, as the simple example of adding a published date to articles above shows.

It’s of course possible to write down RelaxNG grammars for a specific version of the API, but they are tied to that specific version, and must therefore be ignored by clients who want to happily evolve with the server.

Questions ?

I’ve tried to cover all the different scenarios that one encounters when evolving a RESTful API — I’ve left out HTTP specific issues like status codes (must never change) and headers (adding new optional headers is ok) as the Openstack folks have decided for their API Change Guidelines.

I’d be very curious to hear about changes that can not be addressed by one of the mechanisms described above.

Deltacloud 1.0

2012-06-15T00:00:00-07:00

The upcoming release of Deltacloud 1.0 is a huge milestone for the project: even though no sausages were hurt in its making, it is still chockful of the broadest blend of the finest IaaS API ingredients. The changes and improvements are too numerous to list in detail, but it is worth highlighting some of them. TL;DR: the release candidate is available now.

EC2 frontend

With this release, Deltacloud moves another step towards being a universal cloud IaaS API proxy: when we started adding support for DMTF CIMI as an alternative to the ‘classic’ Deltacloud API, it became apparent that adding additional frontends could be done with very little efforts. The new EC2 frontend proves that this is even possible for API’s that are not RESTful. With that, Deltacloud allows clients that only know the EC2 API to talk to various backends, including OpenStack, vSphere, and oVirt.

The EC2 frontend supports the most commonly needed operations, in particular those necessary for finding an image, launching an instance off of it and managing that instance’s lifecycle. In addition, managing SSH key pairs is also supported. We hope to grow the coverage of the API in future releases to the point where the EC2 frontend is good enough to support the majority of uses.

The debate around the ‘right’ cloud IaaS API is heated and continues, especially around standards, and we still see the right answer to this debate in a properly standardized, broadly supported, and openly governed API such as DMTF’s CIMI — yet it is undeniable that EC2 is the front runner in this space, and that large investments into EC2’s API exist; it is Deltacloud’s mission to alleviate the resulting lockin, and the addition of the EC2 frontend allows users to experiment with different backend technologies while migrating off the EC2 API on their own pace.

One issue that the EC2 frontend brings to the forefront is just how unsuitable that API is for fronting different backend implementations: IaaS API’s that are designed for this purpose provide extensive capabilities for client discovery of various features. EC2 on the other hand provides no way for providers to advertise their deviation from EC2’s feature set, and no possibilities for clients to discover them.

CIMI frontend

We continue our quest to support the fledgling CIMI standard as broadly and as faithfully as possible. With this release, we introduce support for the CIMI networking API; for now only for the Mock driver, but we are looking to expand backend support for networking as clouds add the needed features for them.

Besides the core CIMI API, which is purely a RESTful XML and JSON API, work also continues on the simple human-consumable HTML interface for it; we’ve learned from designing the Deltacloud API and helping others using that API, that a web application that stays close to the API, but is easy to use for humans is an invaluable tool. With this release, that application can now talk to OpenStack, RHEV-M/oVirt, and EC2 via Deltacloud’s CIMI proxy.

Operational and code enhancements

With three frontends, it’s become even more urgent that the three frontends can be run from the same server instance to reduce the number of daemons that need to be babysat. Thanks to an extensive revamp of the guts of Deltacloud to turn it into a modular Sinatra app it is now possible to expose all three frontends (or only one or two) from the same server.

We now also base our RESTful routes and controllers on sinatra-rabbit — only fitting since sinatra-rabbit started life as the DSL we used inside Deltacloud for our RESTful routing and our controllers.

A lot of work has gone into rationalizing the HTTP status codes that Deltacloud returns, especially when errors occur; in the process, we learned quite a bit about just how fickle and moody vSphere can be.

Other drivers have seen major updates, not least of which the OpenStack driver, which now works against the OpenStack v2.0 API; in particular, it works against the HP cloud — with the EC2 frontend, Deltacloud provides a capable EC2 proxy for OpenStack. We’ve also added a driver for the Fujitsu Global Cloud Platform, which was mostly written by Dies Köper of Fujitsu.

The release candidate for version 1.0.0 is out now, packages for rubygems.org, Fedora and other distributions will appear as soon as the release has passed the vote on the mailing list.

Sinatra Rabbit - a RESTful DSL

2012-03-13T00:00:00-07:00

TL;DR: have a look at sinatra-rabbit.

When we converted Deltacloud from Rails to Sinatra, we needed a way to conveniently write the controller logic for RESTful routes with Sinatra. On a lark, I cooked up a DSL called ‘Rabbit’ that lets you write things like

collection :images do
  description "The collection of images"

  operation :index do
    description "List all images"
    param :id,            :string
    param :architecture,  :string,  :optional
    control { ... controller code ... }
  end

  operation :show do
    description 'Show an image identified by "id" parameter.'
    param :id,           :string, :required
    control { ... show image params[:id] ... }
  end

  operation :destroy do
    description "Remove specified"
    param :id,    :string,    :required
    control do
      driver.destroy_image(credentials, params[:id])
      status 204
      respond_to do |format|
        format.xml
        format.json
        format.html { redirect(images_url) }
      end
    end
  end

end

That makes supporting the common REST operations convenient, and allows us to auto-generate documentation for the REST API. It has been very useful in writing the two frontends for Deltacloud.

The DSL has lots of features, for example, validation of input parameters, conditionally allowing additional parameters, describing subcollections, autogenerating HEAD and OPTIONS routes and controllers, and many more.

Michal Fojtik has pulled that code out of Deltacloud and extracted it into its own github project as sinatra-rabbit In the process, there were quite a few dragones to slay: for example, in Deltacloud we change what parameters some operations can accept based on the specific backend driver. For example, in some clouds, it is possible to inject user-defined data into instances upon launch. In Deltacloud, the logic of what routes to turn on or off is based on introspecting the current driver, which means that Deltacloud’s Rabbit knows about drivers. That, of course, has to be changed for the standalone sinatra-rabbit. Michal just added route conditions that look like

collection :images do
    operation :create, :if => lambda { complicated_condition(request) } do
        ...
    end
end

Hopefully, sinatra-rabbit will grow to the point where we can remove our bundled implementation from Deltacloud, and use the standalone version; there’s still a couple of features missing, but with enough people sending patches, it can’t be very long now ;)

Public Deltacloud Instances

2011-10-13T00:00:00-07:00

Installing Deltacloud is work. Not a lot of work, in fact it is very easy, but it still involves installing a package/gem and starting a server. For simple development and test uses, even that is not necessary any more.

There’s two of them: one, https://api.deltacloud.org/ runs the latest stable release, currently release 0.4.1. The other, https://dev.deltacloud.org/, runs the bleeding edge code from the git repo.

Both use the same self-signed SSL certificate. Its SHA-1 fingerprint is D3:3D:13:73:37:88:59:F1:FE:08:51:70:A0:BA:60:99:F1:E9:DD:45.

If you’ve been scratching your head, wondering what all this Deltacloud business is about, just head over to one of these two servers and explore the API. There’s a friendly HTML interface for just that, or, of course, the obligatory XML and JSON, variants. The public servers run the EC2 driver as their default; when prompted for a username and password, just enter your Amazon AWS access key ID and secret.

Deltacloud 0.4.0 released

2011-09-14T00:00:00-07:00

We just released Apache Deltacloud 0.4.0, part of the Apache Incubator. The release contains a huge number of enhancements and additions. The full list can be found in the release announcement, but some of them bear highlighting separately.

The biggest new feature is probably a driver for VMWare’s vSphere. This makes it possible to turn any vSphere 4.0 installation into a simple cloud. To use the driver against a vSphere API at https://vsphere.example.com/sdk, start the Deltacloud server with

API_PROVIDER=vsphere.example.com deltacloudd -i vsphere -t 200

Besides the basics of image and instance management, the driver supports a few nifty features, in particular injection of user data.

A second new driver adds support for ‘condor-cloud’, a simple cloud implementation that uses the Condor grid manager as its backend. While it’s probably not enough to replicate EC2, it is certainly good enough to build a simple cloud out of a few machines, Condor, Deltacloud, and a few glue scripts.

We (well, Marios) added support for firewalls to the Deltacloud API. Since not all clouds offer firewalling, this is currently only supported for EC2 and Eucalyptus, via their security groups. The model that the API exposes though represents a fairly generic firewall with rules and sets of rules. We will expand that to other drivers in future releases.

We (well, Michal) reworked the entire HTML UI using jquery-mobile — if you want to explore the Deltacloud API from your smartphone or tablet, that just got a whole lot easier. I find the result a much cleaner UI; since the Deltacloud HTML interface follows the API very closely, as it is meant as a tool to test and explore the API, pages have always been fairly sparse, something that the mobile interface makes less annoying.

For more details, read the release announcement or download the release

Shuttleworth declares goat rodeo over, picks AWS as winner

2011-09-12T00:00:00-07:00

As we all know by now, cloud computing is a veritable goat rodeo, an unseemly sight for anybody’s stomach. Disconcerted by these proceedings, Mark Shuttleworth lets his stomach have the better of him, and declares it over by picking the winner. That, of course, is not how you end a goat rodeo: you end it by pointing out that there are goats, not horses, involved.

The goat in question is the undisputable fact that, in North America, Amazon’s Web Services are the front runner in the IaaS cloud computing space. Which makes Amazon’s API the most used and most widely studied IaaS API. Mark’s recommendation for the OpenStack project, which currently has multiple API’s, but should, for a number of reasons, settle on one is to just adopt the AWS API. And does this with arguments that are simultaneously breathtaking and misleading. In particular, the discussion of HTTP belittles how crucial it was that the definition and evolution of HTTP was not controlled by a single vendor.

I completely agree with Mark that whether the AWS API is a good or bad API is completely besides the point for this discussion — where we disagree is his assertion that API’s aren’t a place where meaningful innovation is going to happen. Anybody who has followed the evolution of the API’s of any single cloud provider, let alone the wider cloud ecosystem, will immediately see the folly of this assertion. And there is no reason to assume that this API evolution will stop any time soon, there are still way too many important debates on modeling familiar concepts for cloud, and on streamlining cloud usage to be had.

The big problem with adopting the AWS API wholesale though is that it puts the future of OpenStack, and if Mark had his way, the whole IaaS space, into the hands of a single vendor. There is absolutely no way for anybody to get a feature into that API, unless they can get Amazon to agree with them and implement it in their version. Mark addresses this concern with a brusque

It’s true that those API’s would better be defined in a clean, independent forum analogous to the W3C than inside the boiler-room of development at any single cloud provider, but that’s a secondary issue. And over time, it can be engineered to work that way.

This is not a matter of ‘better’ or ‘cleaner’ — it is a question of how we, as a community, want to see the world of cloud to shape up. Especially from a North American viewpoint, that landscape looks disheartening right now, as if IaaS cloud will sooner or later be the domain of two or three large vendors; that is not the case in Europe, nor does it have to be the future. The much-used analogy with the early ISP market comes to mind, with AOL’s seemingly unstoppable march to world dominance. One of the reasons this never happened and we all can choose from a large number of ISP’s today of course is the dogged pursuit of truly open standards.

Sticking to one vendor’s offering is the exact opposite of an open standard; and there is no reason to believe that Amazon can be engineered into a more open process for API innovation. For one, I am not aware of a single standards effort, be it through an SDO or an open source project, that Amazon is involved in. Adopting AWS API’s wholesale and widely will reduce the incentive for Amazon to join any open API effort, not increase it.

Amazon’s API’s should be studied because Amazon is a pioneer, a first-mover in the IaaS space, who were in many cases the first ones to think through specific issues. Whatever the technical merits, their lack of participation in open processes makes their API a dangerous bet, no matter how benign they might be today.

P.S.: There’s some talk of Deltacloud in the comments; the analogy with JDBC is completely oblique, since we are talking about loosely coupled web services here, and not tightly coupled in-process API’s. Which also makes the second point, that Deltacloud is and has to be a lowest common denominator, moot. There is enough flexibility in a RESTful API to avoid that. Interestingly though, the comment directly contradicts Mark’s assertion that API’s aren’t a place to innovate — certainly, there is some innovation in all these different models.

My git workflow

2011-09-08T00:00:00-07:00

Somehow, I find myself writing the same email to introduce people to git over and over again. But no more ! Now, I will only send out links to this blog entry.

Git can be intimidating at first, even though it is probably the most forgiving source control system out there. There are plenty of great tutorials on git these days. I assume that you have read enough of them to understand the absolute basics of cloning a repo, pulling to update your repo, and how to commit. In other words, you’ve run git clone, git pull and git commit before.

The key to working with git happily is to use branches liberally; if in doubt, branch. And the key to working with branches is understanding git rebase. In particular, git rebase -i will make you fall in love with git. It lets you not just edit committed patches, it also lets you combine patches, reorganize them etc. Once the initial excitement over interactive rebase wanes, try out interactive add (git add -i) to renew the bliss.

Once the basics are out of the way, you will want to implement some extension to whatever you’ve cloned and pulled, and then submit that back upstream for inclusion. That usually involves working on your own for a bit, and then generating and sending out patches of your work for review and merging upstream. Changes you make should always go onto private (‘topic’) branches; create a new branch for each piece of distinct work. The overall workflow for this is

    git checkout master
    git pull # make sure we have the latest bits
    git checkout -b dev/feature
    ... edit/add/commit until happy, with an eye towards having your
    branch constitute an easily reviewable patch series; when
    working on the branch for longer, pull master repeatedly and
    rebase your branch ...

Once your work is ready to be shared with the rest of the world, do the following to generate and mail out patches

    git checkout master
    git pull
    git rebase master dev/feature
    git format-patch -o /tmp/patches master
    git send-email --to=hackers@example.org --compose --subject 'Awesome feature' --thread /tmp/patches

When changes need to be made to address review comments, work them into your dev/feature branch, using interactive rebase to add them where needed in the patch series, then repost.