May 17 2007

Linus Torvalds explains distributed source control

Published by at 7:00 am under Technology   

On several occasions over the last year, I’ve pointed out that distributed source control tools are dramatically better than centralized tools. It’s quite hard for me to explain why. This is probably because of sloppy and incomplete thinking on my part, but it doesn’t help that most of the audiences / people I’ve said this to, have never used a distributed tool. (I’ve been trying out SVK, bzr, git, etc.) Fortunately, I no longer need to stumble with attempts at explaining this; instead, the answer is to watch as:

Linus Torvalds explains distributed source control in general, and git in particular, at Google

Here are some of his points, paraphrased. They might not be clear without watching the video.

  • He hates CVS
  • He hates SVN too, since it’s “CVS done right”, because you can’t get anywhere good from there.
  • If you need a commercial tool, BitKeeper is the one you should use.
  • Distributed source control is much more important than which tool you choose
  • He looked at a lot of alternatives, and immediately tossed anything not distributed, slow, or which does not guarantee that what goes in, comes out [using secure hashes]
  • He liked Monotone, but it was too slow
  • With a distributed tool, no single place is vital to your data
  • Centralized does not scale to Linux-kernel sized projects
  • Distributed tools work offline, with full history
  • Branching is not an esoteric, rare event – everyone branches all the time every time they write a line of code – but most tools don’t understand this, they only understand branching as a Very Big Deal.
  • Distributed tools serve everyone; everyone has “commit” access to their branches. Everyone has all of the tool’s features, rather than only a handful of those with special access.
  • Of course noone else will necessarily adopt your changes; using a tool in which every developer has the full feature set, does not imply anarchy.
  • Merging works like security: as a network of trust
  • Two systems worth looking at: git and Mercurial. The others with good features are too slow [for large projects].
  • Lots of people are using git for their own work (to handle merges, for example), inside companies where the main tools is SVN. git can pull changes from (and thus stay in sync with) SVN.
  • Git is now much easier to use than CVS or other common tools
  • Git makes it easier to merge than other tools. So the problem of doing more merging, is not much of a problem, not something you need to fear.
  • Distributed tools are much faster because they don’t have to go over the network very often. Not talking to a server, is tremendously faster, than talking to even a high end server over a fast network.
  • SVN working directories and repositories are quite large, git equivalents are much smaller
  • The repository (=project) is the unit of checkout / commit / etc.; don’t put them all in to on repository. The separate repositories can share underlying storage.
  • Performance is not secondary. It affect everything you do, it affects how you use an application. Faster operations = merging not a big deal = more, smaller changes.
  • SVN “makes branching really cheap”. Unfortunately, “merging in SVN is a complete disaster”. “It is incredible how stupid these people are”
  • Distributed source control is mostly inherently safer: no single point of failure, secure hashes to protect from even intentional malicious users.
  • “I would never trust Google to maintain my source code for me” – with git (and other distributed systems) the whole history is in many places, nearly impossible to lose it.

My own observations:

There are important differences between source control tools. I have heard it said that these are all “just tools” which don’t matter, you simply use whatever the local management felt like buying. That is wrong: making better tool choices will make your project better (cheaper, faster, more fun, etc.), making worse tool choices will make your project worse (more expensive, slower, painful, higher turnover, etc.)

Distributed tools make the “network of trust” more explicit, and thus easier to reason about.

I think there is an impression out there that distributed tools don’t accomodate the level of control that “enterprise” shops often specify in their processes, over how code is managed. This is a needless worry; it is still quite possible to set up any desired process and controls for the the official repositories. The difference is that you can do that, without the collateral damage of taking away features from individual developers.

Git sounds like the leading choice on Linux, but at the moment it is not well supported on Windows, and I don’t see any signs of that changing anytime soon. I will continue working with bzr, SVK, etc.

There is widespread, but mostly invisible demand for better source control. Over the next few years, the hegemony of the legacy (centralized) design will be lessened greatly. One early adopter is Mozilla, which is switching (or has switched?) to Mercurial. Of course, many projects and companies (especially large companies) will hang on for years to come, because of interia and widespread tool integration.

Several of the distributed tools (SVK, bzr, git, probably more) have the ability to pull changes from other systems, including traditional centralized systems like SVN. These make it possible to use a modern, powerful tool for your own work, even when you are working on a project where someone else has chosen such a traditional system for the master repository.

Update: Mark commented that he doesn’t feel like he has really commited a changeset, until it is on a remote repository. I agree. However, this is trivial: with any of the tools you can push your change from your local repository to a remote one (perhaps one for you alone), with one command. With some of them (bzr, at least) you can configure this to happen automatically, so it is zero extra commands. This does not negate the benefits of a local repository, because you can still work offline, and all read operations still happen locally and far more quickly.

Update: I didn’t mention Mercurial initially, but I should have (thanks, Alex). It is another strong contender, and Mozilla recently chose it. Its feature set is similar to git’s, but with more eager support for Windows.

Update: There is another discussion of Linus’s talk over on Codice Software’s blog, which was linked on Slashdot. Since Codice sells a source control product, the Slashdot coverage is a great piece of free publicity.

Update: Mark Shuttleworth pointed out below that bzr is much faster than it used to be, and that it has top-notch renaming support; which he explains in more detail on in a post. Bzr was already on my own short-list, I am using it for a small project here at Oasis Digital.

Update: I’ve started gathering notes about git, bzr, hg, etc., for several upcoming posts (and possibly a talk or two). If you’re interested in these, subscribe to the feed using the links in the upper right of my home page.

If you found this post useful, please link to it from your web site, mention it online, or mention it to a colleague.

11 responses so far

11 Responses to “Linus Torvalds explains distributed source control”

  1. Maybe it’s just paranoia from using not distributed source control tools, but I think I wouldn’t feel like I had really checked in a change unless I checked it into a remote repository. Checking it into my local repository still leaves me open to losing code when my harddrive crashes. For this reason, the arguments that distributed tools are better because they “work offline” and “are much faster because they don’t have to go over the network very often” aren’t very appealing or comforting to me. However, the idea that my changes can be committed to a replicated repository is very appealing.

    Adoption of distributed source control tools may be slow because most developers probably don’t perceive that there is a problem with their current tool.

    I’d be interested in learning more about git. Hopefully someone will do a presentation on this at a local user group meeting soon.

  2. Dan Miser says:

    Thanks for that very complete post (and link), Kyle. You are always finding a way to push the envelope! 🙂

  3. Craig Buchek says:

    I’m looking to move from Subversion to SVK or Bazaar (bzr) myself. Do you have any advice on which one to choose? I want to keep my Subversion mainline repositories on the web, but do offline development on my notebook.

  4. Kyle Cordes says:

    I don’t have a recommendation, though I expect to have one eventually, after we have substantial experience on real projects.

    SVK appears to be the most closely suited at being used as a better SVN client, though of course it’s much more than that. I have personally tested cloning an SVN project then committed changes back, with git and SVK. I am using bzr for another project, and it supports that kind of using with an svn plugin, but I haven’t tried that plugin yet.

  5. Todd Jordan says:

    Best quote:
    There are important differences between source control tools. I have heard it said that these are all “just tools” which don’t matter, you simply use whatever the local management felt like buying. That is wrong: making better tool choices will make your project better (cheaper, faster, more fun, etc.), making worse tool choices will make your project worse (more expensive, slower, painful, higher turnover, etc.)

    The inertia of our current shops will carry us even past the newness and best practices being found today, until they are replaced by even better tools. Many shops won’t know the joy of tools like Git until it’s old news.

    But I see hope against this inertia as the ‘old skoolrz’ age out and move up, and the next generation want to experiment more.
    Thanks for the excellent and well thought out post.

  6. Alex Miller says:

    This is an interesting topic and I don’t think you should miss the ruminations that have been going on in the Java world in the process of open-sourcing the JDK. Several key people have been blogging on the selection and conversion process for moving from Sun’s internal TeamWare system to Mercurial (a distributed SCM). Most recently, check out: Mark Reinhold’s recent entry and also the blogs from Kelly O’Hair and Martin Englund. Certainly, there has been some outcry from the Java public about not choosing Subversion but apparently Sun did not feel that it met the basic requirements. So, I would suspect that Mercurial (which I can’t type for anything…arrghg) is going to be receiving a lot of attention in the near future. Maybe another one to consider. The blogs linked above contain a bunch more interesting links, The authors have other blog entries on these topics as well.

  7. Alex Miller says:

    Also, I should mention that you will find links in Mark’s post to an evaluation that was done prior for Open Solaris, where Mercurial was chosen over bzr and git.

  8. Nathan Neff says:

    I’ve been checking out darcs, which is also a distributed source control tool, and it was easy to set up and work with.

    I had a bit of a problem with the overzealous default filter (it didn’t import .class files, and I wasn’t able to get it to allow .class files). There was talk of it being slow. I tested darcs on a smaller “project”, it works fine for smaller projects.

    Has anyone else experimented with darcs?

  9. Linux Poster Boy Talks about SCM…

    Códice Software: Linus Torvalds on GIT and SCM
    speech Linus Torvalds gave some days ago at Google, basically talking about GIT and Source Control Management
    Codice Software have a nice blog entry about Linus’ presentation on Source Control Mana…

  10. Mark says:

    Bzr (“bazaar”) is the revision control system we chose for the Ubuntu project, because of its perfect support for renames. By “perfect” I mean that you cannot break Bzr no matter how aggressively your community renames files and directories. One contributor can rename a directory, someone else can rename a subdirectory, and other people can rename files *and add* files in both directories, and then you can merge from all of them, and It Just Works.

    Bzr was designed for multiplatform work, so it has the best support for Windows too (not a big issue for Ubuntu, but definitely an issue for upstreams).

    Early versions of Bzr were slower than Git or Mercurial, but the performance is improving constantly, and Bzr now takes less than a second (“heartbeat time”) to do a status on a project of up to 5,000 files. I think the team is confident they will match Mercurial in performance without losing their advantages on renaming robustness. Commit is still slower on Bzr, but status is the key performance item from a usability perspective.

    So, please add Bzr to your list of things people shoudl check out. bazaar-vcs.org.