Use SVK to remotely “svnadmin dump” an SVN repository

One of the nice things about SVN is how easy it is to carry the complete SVN history from one server to another: “svnadmin dump” produces a single (large) dump file with the complete history, then “svnadmin load” to recreate it on the new machine. However, for a handful of our projects we have an SVN repository hosted on a machine where we don’t have shell access to run svnadmin.

After considerable Google searching, I found that SVN itself offers no way to do this; svnadmin takes these parameters:

svnadmin dump path-to-repos

and only works on local repositories. I found some mailing list discussion about it, but unfortunately it was of this form:

user: “I’d like to do X”

developer: “You don’t need to do X, it works like Y”

Fortunately, this post from Thomas Mølhave explains how to use SVK to accomplish something pretty close (though not quite) to this. It was quite easy on my Ubuntu test machine:

apt-get install svk (if you don’t have SVK yet)

or

rm -rf ~/.svk (if you have SVK, but don’t care about your current stuff)

or

mv ~/.svk ~/old.svk (if you have SVK, get your current config out of the way)

then:

svk ls https://your.svn.URL/here/

svk will prompt you for all the bits of info needed to mirror the SVN repository to your local SVK storage, then list the top level files/directories. Next, take advantage of the fact that SVK uses SVN’s underlying storage mechanism, and dump that local SVK mirror, skipping revision 1 which contains SVK metadata:

svnadmin dump -r2:HEAD ~/.svk/local >something.dump

This dump can now be restored elsewhere with “svnadmin load”. Unfortunately the pathnames in it will be mangled, with the original SVN repository hostname and path prepended. For my purposes, this didn’t matter, as I only needed to make it available for occasional reference. You could of course use “svn mv” to clean it up.

On a broader note, SVK looks like a very fine distributed source control tool. On a single developer project I recently droppen SVN in favor of bzr, a distributed source control tool; one of these days I will choose such a tool for a larger project.

Update: I discuss another method for remote SVN backups, svnsync in a later post.

Google Tech Talks

Google, a mecca for top notch programmers, attracts many top speakers to give talks on (generally) technical topics. They graciously record these talks and upload them to Google Video. You can get a list of most of them by searching video.google.com for “engEDU”. Think of these as virtual user group talks, but usually with bigger “name” speakers than a typical local group offers.

Here are just a few that’s I’ve enjoyed recently, there are many more worth watching.

How Debian (Ubuntu) packages work

Seth Godin (marketing guru)

Mary Poppendieck (“Lean Software Development” author) – Competing on the basis of speed

A new Way to look at Networking – Fascinating

The Mercurial distributed source control system

Added later:

Fission is the New Fire

Jessica Livingston, talking about “Founders at Work”

Still later:

Subscribe to this feed to find out about each talk

DRBD on Ubuntu

This week I experimented with DRBD (Distributed Replicated Block Device) which is “a block device which is designed to build high availability clusters. This is done by mirroring a whole block device via (a dedicated) network. You could see it as a network raid-1.”

DRBD can be used for build a “poor man’s SAN”, a RAID running over Ethernet. In our case, we want a warm spare of an important database system, i.e. a second set of hardware which can take over for the primary within a few minutes of failure, and DRBD might be a great way accomplish that. DRBD runs on commodity hardware (ordinary servers, network adapters, etc.) so it is inexpensive. It is also, if benchmarks and comments are to be believed, surprisingly good. I suspect this is because CPUs and networks (Gigabit Ethernet, ideally) are so much faster than disks.

The systems at hand for testing here run the Ubuntu distribution, itself based on Debian. It took a lot of reading and learning, but very little actual typing, to get DRBD going on Ubuntu. My writeup here should help the next person along. As I write this, the released version of DRBD is 0.7, and the DRBD site encourages ignoring the newer unofficial release, advice that I followed.

First, read the HOWTO and the FAQ, but don’t follow the instruction in them yet; then you are ready to begin. The guessable part is to install the module with apt-get (you’ll need the “Community” APT sources in Ubuntu; I don’t know offhand what is equivalent in Debian):

apt-get install drbd0.7-module-source

This installs the module’s source code, not the module itself. After doing this, I spent a couple of hours trying to figure out how to compile it – untarring it and running “make” does not work. There is an abundance of confusing and misleading advice on the web about this, including advice to compile your own kernel. If you don’t need a custom-compiled kernel for other reasons, ignore all that and discover the joy of Debian’s “Module Assistant”, invoked by the command “m-a”.

m-a prepare

“prepare” will get your machine ready to compile modules that can link with the running kernel; it will prompt to download and install packages needed for this, including the correct kernel header packages. Now you are ready to build and install the module:

m-a a-i drbd0.7-module-source

Follow the prompts; if it asks to install more packages, agree to do so. On Ubuntu 6.10, it may fail with an error about “mv”. This is a known bug; the workaround is to expand the module source file (it lives in /usr/src/), add the line “SHELL=/bin/bash” at the top of each Makefile, then retar the module source.

By the way, this points out a stark and troubling aspect of Ubuntu “community” packages: not only are they not assured to work, they are not even casually tested to verify they compile on the Ubuntu version they are purportedly intended for use with… which I find very disappointing. I suppose the lesson is that Ubuntu is very slick for the set of officially supported packages, but if you’ll be using a lot of packages beyond that set, its appeal is much diminished.

Now you are ready to enable DRBD, following the HOWTO instructions, starting with setting up /etc/drbd.conf.

In my case I used LVM to make a 10 GB partition on each machine, put DRBD on top of that, put a filesystem (Reiserfs, for variety) on top of them, then moved/symlinked my PostgreSQL data there. It worked fine, including these tests: I started a long intensive DB operation (restoring a several-GB backup), then with that running, rebooted the secondary machine. This did not interrupt the DB operation, and when the secondary finished its reboot, the DRBD partition re-synced automatically. I then shut down the primary machine, and verified I could mount and read that same data on the secondary machine… all exactly as described.

One caveat though – in a few cases, machines with DRBD stopped during the boot sequence, waiting for user intervention (in this case, closing a shell) because they the DRBD device wouldn’t mount. I suspect this was because I don’t fully understand how to configure it.

Of course this was just a short initial look, but I am very impressed with DRBD so far.

Fresh Windows Install, Ouch

A few hours ago I started with a fresh Windows XP Home install for a computer for my family.

I’m still going.  I have lost count of how many times I have had to reboot the machine then get a new round of updates to install.  It is ludicrous.

Modern Linux distributions are far superior to this.  I recently installed Ubuntu on a several machines; the installer downloaded updates during the install process, so the machine was fully updated, on the first boot.

LVM (Logical Volume Management) – a very good thing

This Mailing list post from a user of “MythTV” reminded me of the wonders of LVM (Logical Volume Management) which is built in to Linux (HowTo, Resource page).  I first saw LVM years ago on a commercial Unix, and didn’t quite understand the point.  Now I see the point clearly, and set up most new machines with LVM.

If you’re not using LVM, and you have non-trivial storage hardware (more than one hard drive), now is the time to start.

On an OS without LVM built in (as I am, on my Windows machines..), it is often necessary to do things offline, with external tools (like Partition Magic) that could be done online in a running system with LVM.

Basement Data Center

It occured to me today that a section of my basement is starting to resemble a data center:

Data Center

The four machines here:

  • Are all test machines, or being configured and burned in for future deployment – the production hardware for our projects invariably ends up in customers’ data centers or in robust colo facilities (redundant power, massive air conditioning, redundant “tier one” network connectivity, etc.)
  • All use AMD CPUs – two of them have dual-core Athlon64 X2’s.
  • Are all black – black is still the default color, it appears.

Other tidbits:

  • The leftmost machine is a “Shuttle XPC”.
  • The shelves are homemade and very strong.
  • Three of the machine run only Linux; the other dual-boots to Windows XP.
  • Very little work happens on these machine at the physical console – they are accessed via SSH, VNC, Terminal Services, etc.
  • There is a dedicated 15 amp circuit to run these machines.
  • The reason they are here, in my basement, is so that the hardware and OS can be configured. When one becomes wedged it can easily be rebooted or diagnosed at the terminal; that happens all to often when, for example, trying a variety of Linux kernels to get virtualuzation working .