Java Documentation in a Windows HTMLHelp (CHM) file

The Java software development kit documentation can be downloaded from java.sun.com, in the form of a ZIP file with tens of thousands of linked HTML file. This seems like an ideal canonical form for this documentation, but it is inconvenient to use, and inconvenient to have so many files sitting around.

If your development machine runs Windows, a Windows HTML Help is more convenient; it consists of a handful of files (mostly one large CHM file) and offers good search and browse features. It can be unpacked or moved around far more quickly. Franck Allimant processes the HTML from Sun for each new Java release, and makes the CHMs etc. available on his web site. Recommended.

Allimant has been doing this for quite a few years now; I suspect it was unauthorized at first, but some kind of peace was made with Sun, which now links to the site.

Linus Torvalds explains distributed source control

On several occasions over the last year, I’ve pointed out that distributed source control tools are dramatically better than centralized tools. It’s quite hard for me to explain why. This is probably because of sloppy and incomplete thinking on my part, but it doesn’t help that most of the audiences / people I’ve said this to, have never used a distributed tool. (I’ve been trying out SVK, bzr, git, etc.) Fortunately, I no longer need to stumble with attempts at explaining this; instead, the answer is to watch as:

Linus Torvalds explains distributed source control in general, and git in particular, at Google

Here are some of his points, paraphrased. They might not be clear without watching the video.

  • He hates CVS
  • He hates SVN too, since it’s “CVS done right”, because you can’t get anywhere good from there.
  • If you need a commercial tool, BitKeeper is the one you should use.
  • Distributed source control is much more important than which tool you choose
  • He looked at a lot of alternatives, and immediately tossed anything not distributed, slow, or which does not guarantee that what goes in, comes out [using secure hashes]
  • He liked Monotone, but it was too slow
  • With a distributed tool, no single place is vital to your data
  • Centralized does not scale to Linux-kernel sized projects
  • Distributed tools work offline, with full history
  • Branching is not an esoteric, rare event – everyone branches all the time every time they write a line of code – but most tools don’t understand this, they only understand branching as a Very Big Deal.
  • Distributed tools serve everyone; everyone has “commit” access to their branches. Everyone has all of the tool’s features, rather than only a handful of those with special access.
  • Of course noone else will necessarily adopt your changes; using a tool in which every developer has the full feature set, does not imply anarchy.
  • Merging works like security: as a network of trust
  • Two systems worth looking at: git and Mercurial. The others with good features are too slow [for large projects].
  • Lots of people are using git for their own work (to handle merges, for example), inside companies where the main tools is SVN. git can pull changes from (and thus stay in sync with) SVN.
  • Git is now much easier to use than CVS or other common tools
  • Git makes it easier to merge than other tools. So the problem of doing more merging, is not much of a problem, not something you need to fear.
  • Distributed tools are much faster because they don’t have to go over the network very often. Not talking to a server, is tremendously faster, than talking to even a high end server over a fast network.
  • SVN working directories and repositories are quite large, git equivalents are much smaller
  • The repository (=project) is the unit of checkout / commit / etc.; don’t put them all in to on repository. The separate repositories can share underlying storage.
  • Performance is not secondary. It affect everything you do, it affects how you use an application. Faster operations = merging not a big deal = more, smaller changes.
  • SVN “makes branching really cheap”. Unfortunately, “merging in SVN is a complete disaster”. “It is incredible how stupid these people are”
  • Distributed source control is mostly inherently safer: no single point of failure, secure hashes to protect from even intentional malicious users.
  • “I would never trust Google to maintain my source code for me” – with git (and other distributed systems) the whole history is in many places, nearly impossible to lose it.

My own observations:

There are important differences between source control tools. I have heard it said that these are all “just tools” which don’t matter, you simply use whatever the local management felt like buying. That is wrong: making better tool choices will make your project better (cheaper, faster, more fun, etc.), making worse tool choices will make your project worse (more expensive, slower, painful, higher turnover, etc.)

Distributed tools make the “network of trust” more explicit, and thus easier to reason about.

I think there is an impression out there that distributed tools don’t accomodate the level of control that “enterprise” shops often specify in their processes, over how code is managed. This is a needless worry; it is still quite possible to set up any desired process and controls for the the official repositories. The difference is that you can do that, without the collateral damage of taking away features from individual developers.

Git sounds like the leading choice on Linux, but at the moment it is not well supported on Windows, and I don’t see any signs of that changing anytime soon. I will continue working with bzr, SVK, etc.

There is widespread, but mostly invisible demand for better source control. Over the next few years, the hegemony of the legacy (centralized) design will be lessened greatly. One early adopter is Mozilla, which is switching (or has switched?) to Mercurial. Of course, many projects and companies (especially large companies) will hang on for years to come, because of interia and widespread tool integration.

Several of the distributed tools (SVK, bzr, git, probably more) have the ability to pull changes from other systems, including traditional centralized systems like SVN. These make it possible to use a modern, powerful tool for your own work, even when you are working on a project where someone else has chosen such a traditional system for the master repository.

Update: Mark commented that he doesn’t feel like he has really commited a changeset, until it is on a remote repository. I agree. However, this is trivial: with any of the tools you can push your change from your local repository to a remote one (perhaps one for you alone), with one command. With some of them (bzr, at least) you can configure this to happen automatically, so it is zero extra commands. This does not negate the benefits of a local repository, because you can still work offline, and all read operations still happen locally and far more quickly.

Update: I didn’t mention Mercurial initially, but I should have (thanks, Alex). It is another strong contender, and Mozilla recently chose it. Its feature set is similar to git’s, but with more eager support for Windows.

Update: There is another discussion of Linus’s talk over on Codice Software’s blog, which was linked on Slashdot. Since Codice sells a source control product, the Slashdot coverage is a great piece of free publicity.

Update: Mark Shuttleworth pointed out below that bzr is much faster than it used to be, and that it has top-notch renaming support; which he explains in more detail on in a post. Bzr was already on my own short-list, I am using it for a small project here at Oasis Digital.

Update: I’ve started gathering notes about git, bzr, hg, etc., for several upcoming posts (and possibly a talk or two). If you’re interested in these, subscribe to the feed using the links in the upper right of my home page.

Use SVK to remotely “svnadmin dump” an SVN repository

One of the nice things about SVN is how easy it is to carry the complete SVN history from one server to another: “svnadmin dump” produces a single (large) dump file with the complete history, then “svnadmin load” to recreate it on the new machine. However, for a handful of our projects we have an SVN repository hosted on a machine where we don’t have shell access to run svnadmin.

After considerable Google searching, I found that SVN itself offers no way to do this; svnadmin takes these parameters:

svnadmin dump path-to-repos

and only works on local repositories. I found some mailing list discussion about it, but unfortunately it was of this form:

user: “I’d like to do X”

developer: “You don’t need to do X, it works like Y”

Fortunately, this post from Thomas Mølhave explains how to use SVK to accomplish something pretty close (though not quite) to this. It was quite easy on my Ubuntu test machine:

apt-get install svk (if you don’t have SVK yet)

or

rm -rf ~/.svk (if you have SVK, but don’t care about your current stuff)

or

mv ~/.svk ~/old.svk (if you have SVK, get your current config out of the way)

then:

svk ls https://your.svn.URL/here/

svk will prompt you for all the bits of info needed to mirror the SVN repository to your local SVK storage, then list the top level files/directories. Next, take advantage of the fact that SVK uses SVN’s underlying storage mechanism, and dump that local SVK mirror, skipping revision 1 which contains SVK metadata:

svnadmin dump -r2:HEAD ~/.svk/local >something.dump

This dump can now be restored elsewhere with “svnadmin load”. Unfortunately the pathnames in it will be mangled, with the original SVN repository hostname and path prepended. For my purposes, this didn’t matter, as I only needed to make it available for occasional reference. You could of course use “svn mv” to clean it up.

On a broader note, SVK looks like a very fine distributed source control tool. On a single developer project I recently droppen SVN in favor of bzr, a distributed source control tool; one of these days I will choose such a tool for a larger project.

Update: I discuss another method for remote SVN backups, svnsync in a later post.

Selling your Software as a Service: Notes and Audio

At the St. Louis Code Camp on May 5, 2007, I gave a talk on Selling Your Software as a Service, in which I discussed our experiences selling a complex (Java) “enterprise” application in that manner. The room was much more crowded than I expected, it was exciting to have an eager group. As with all my recent talks, I used a handout instead of slides. You can download a PDF of the handout (one page, one side), or read the contents below.

The 1 hour audio recording (Olympus WS-100 digital voice recorder, Audacity cleanup) is available here: SAASTalk.mp3

A transcript of the talk is available. In the talk I mentioned Paul Graham’s The Other Road Ahead, which is shorter and easier to read the my talk transcript.

A couple of people at Code Camp asked if I could come give a similar talk in-house at their firms. Yes – please contact me with the contact form to arrange a date.

The handout contents follow.

Continue reading “Selling your Software as a Service: Notes and Audio”

Excellent JavaScript talk from Yahoo

Over at Yahoo Video you can watch an excellent talk by Doug Crockford on JavaScript (part 1). (part 2, part 3, part 4) This is likely the best introduction to JavaScript I have seen, and worthwhile even if you’ve been using JS for years.

Why does JavaScript matter?

1) It is ubiquitous now (in nearly every browser, in Flash as ActionScript, etc.)

2) It is likely to be the default choice for building scriptable Java applications, due to the Rhino JS interpreter “in the box” in Java 1.6

Update: These videos are more conveniently all on one page here.

High Quality Screen Recordings

At Oasis Digital we’ve found that we can communicate effectively with each other and with customers, across time and space, using screen + audio recording (also called screencasts or screen videos). We use these to demonstrate a new feature, to explain how code works, to described how a new feature should work, etc. The communication is not as good as a live, in-person meeting/demo, but the advantages often outweigh that factor:

  1. No travel.
  2. No need to syncronize schedules.
  3. The receiving person can view the recording repeatedly, at their convenience.
  4. Customers and develoeprs who join the project team later, can look at old recordings to catch up.

It turns out that I am unusually picky about the quality of such recordings; I’ve written up some technical notes on how to get good results, and posted them: HighQualityScreenRecordings.pdf.

A few highlights:

  • A reasonably fast computer can both run application and record screen video at the same time; but if you will be recording the use of an application that generates a lot of disk activity, you must save the video to separate hard drive (internal, external, network server, etc.) from the hard drive you are running your OS and applications from. (For applications that generate little disk activity, a single system hard drive works fine.)
  • Use a headset-style microphone, and record in a quiet place: close the door, turn off the music, etc.
  • Adjust your audio levels well. Please. This is the most common and most annoying problem with screencast and podcast recordings I find.
  • Bytes are cheap; use a sufficiently large window and sufficiently high bitrate.

Many more details are in the PDF linked above.