Great Developers, Projects That Sound Boring

I’ve been a fan of Joel Spolsky for years, though I haven’t agreed with everything he’s written, and even mocked him a bit. Joel has written at length on his web site and in print about attracting the best developers, and one aspect of that has bothered me:

How do you attract top developers to work on something that sounds rather dull, like a bug tracking application? It mostly shuffles data back and forth between screens and reports and database tables – far too boring for top developers.

Of course that’s an exaggeration, but a relevant one: at Oasis Digital much of our work is on enterprise business process automation, database-centric applications, and could likewise be described casually (though not accurately) as “just” shuffling data between screens and tables. I worry that our work will not sound interesting to prospective hires.

This week at the Business of Software conference I got a chance to ask (confront?) Joel about this. He offered a great, four-part answer, which I present here with my own additions mixed in. I don’t have careful notes about which bits came from Joel, so you are welcome to give him all the credit and me all the blame.

1) There is Interesting Technology Inside

Even in an application which, at first glance, just shuffles data around, there can turn out to be a lot of very interesting work inside. This is true of FogBugz and it is true of our work at Oasis Digital as well. Here are some examples of interesting work here, all of it inside dull-sounding applications:

  • Process metadata and generate code and GUI elements. Top developers certainly are those who solve a family of problems with generic code and metadata, rather than tediously one at a time.
  • Process large hierarchies efficiently using Celko’s nested sets representation technique. Top developers are all about using better data structures.
  • Custom GUI components to provide a drag-drop, direct manipulation approach to visualize and modifying data. The results has both a high “wow” factor, and is genuinely useful – a willing combination for top developers.
  • Integrate a Prolog-based rules mechanism to provide a vital algorithm in one page of code, that would have required countless pages of code and hundreds-to-thousands of hours of work to do otherwise. Using a radically different language to solve a problem with a small fraction of the effort… exactly the sort of thing a top developer wants to do.
  • Generic data replication mechanism: building our own was certainly more interesting work than adoptions one off the shelf.
  • Learn how OLAP works, implement an OLAP ETL process.
  • … I could list many more examples

2) One Level Down the Stack

Fog Creek aims to make compellingly good software because that is how you outcompete established competitors in a commercial shrinkwrap market. Oasis Digital likewise has this aim for a different reason: it is our intended niche. We aim to differentiate ourselves from “yet another outsourced dev company” by building unexpectedly good software. We don’t want customers who will be happy with the results available from the typical development firm; we want customers who are playing to win.

To meet either goal, it is sometimes necessary to work at one level of abstraction lower than would otherwise be necessarily. Joel’s examples were their own data grid and their own AJAX library. Some of our examples are listed above.

This kind of work, further in to the details, is generally more compelling to top developers.

Caveat: Don’t do this very often. If you want to ship software anytime soon, you need to mostly use off the shelf libraries that already work. Don’t build a data-grid, for example, unless you really need something that you can’t find in any off the shelf products. We don’t have our own data-grid; we use (among other things) the excellent grid products from Developer Express.

3) Problem Domain is less important than other factors

Joel has observed that developers aren’t as picky about the problem domain of their project as one might think; rather, other factors are more important: great co-workers, nice working environment, working for a boss who is a developers, etc. Top developers want to work on a high quality end product worthy of taking pride in.

4) All Projects need a lot of Grunt Work

To build a high quality product, in any problem domain, will require spending a great amount of your time on grunt work: tracking down bugs and fixing them, filling in feature “holes”, cleaning up design problems, improving GUI layouts, and the like. This is true for any problem domain. Top developers know this, and just get on with doing that work when needed.

Conclusion

My worry was unjustified. Our work at Oasis Digital is interesting and worthwhile, both for our customers and for our developers. To grow our team, we must focus on making it an increasingly good place to work.

A final anecdote: Joel mentioned that FogBugz 6 was feature-complete in the summer of 2006 – that means they spent around a year polishing it, fixing bugs, filling in holes, etc. That shows a phenomenal amount of dedication and discipline to create quality software.

Distributed Version Control for the Other 80%

Ben Collins-Sussman, one of the key developers behind Subversion, argues in Version Control and the 80% that distributed version control will remain a niche interest, and will not move in to the mainstream (as his favorite tool certainly has). He has a number of good reasons to back up this thesis.

I think he’s wrong. The “other 80%” are not profoundly stupid imbeciles who could never grasp the point of DVCS. Rather they are, generally, working developers with important projects underway, for which they need tools to work well out of the box when used in the default way. DVCS tools can certainly do that. More specifically, the list of reasons he gives why DVCS won’t become broadly popular, should be read more as a to-do list of how to improve DVCS so they can become broadly popular. What the DVCS community needs is at least one DVCS which:

  • Installs easily on Windows, with a single installer, including diff/merge tool and GUI
  • Includes a very good standalone GUI
  • Secures client/server (peer-to-peer) communicate by default, without user setup of SSH, HTTPS, etc.
  • Integrates well with Eclipse
  • Integrates well with Visual Studio
  • Integrates well with Explorer (i.e. TortoiseBlah)
  • Integrates, begrudgingly, with Microsoft’s SCC API so as to support the many tools which can use an SCC API plugin
  • Includes permission controls for server repositories, including good tools for configuration thereof
  • Automates sharing of branches trivially (some already do this, some less so)Automates the common ways of using a DVCS, most importantly the usage model in which the DVCS is used as a better SVN with full offline capabilities
  • Guides users, if so configured, gently back toward a small number (one, in some cases) of main central branches, which is what most projects want
  • Communicates clearly what kind of project it can support well (most of them) and what kind it won’t support well (those with an enormous pile of huge files, of which most users only need a few)

(SVN itself is not without flaws. Ben lists some of them as areas in which improvements are coming, while others (such as, in my opinion, using the file namespace for branches and tags) are likely here to stay.)

In the next few years we will probably see one or more DVCS tools gain most or all of the features above. With addresses, an important truth will be more obvious: distributed source control is, in most ways, a superset of centralized source control, and the latter can be thought of as a special case of the former.

That said, though, I think the DVCS movement will lose a bit of steam when SVN ships better merge support, if that merge support is sufficiently good. The merge “features” are certainly the biggest issue we have here with SVN.

Growing a Language, by Guy Steele

This is an oldie-but-goodie: Guy Steele’s “Growing a Language” talk from OOPSLA 1998.

It is amazing to me that Guy, whose is something of a legend in language design, and who thinks so clearly about what makes a good language, was also key in designing Java. Java has been extremely slow to grow in the sense described in this talk, because for many years Sun resisted such growth. Only the rise of C# and the growing popularity of dynamic languages generated enough pressure to get Java unstuck… and in the last few year Java has become somewhat growable in the sense Guy describes.

A Brief Introduction to Distributed Version Control

Last night at SLUUG, I have a talk on distributed source control tools. It was quite introductory, but the notes (below) may still be helpful. These notes were on a handout at the talk, as usual I didn’t use slides.

Unfortunately I didn’t get an audio recording of this talk, so no transcript either.

About 30 people were in attendance. Nearly 100% were familiar with CVS and SVN, and perhaps 20% with other tools (ClearCase, SourceSafe, and others). Only 4 had ever used branch/merge in any project or tool!

A Brief Introduction to Distributed Version Control

You Branch and Merge.

A fundamental truth: every time you edit a file you are branching, and every time you reconcile with another developer you are merging. In most tools you get one easy branch and merge locus: your local working directory. All other branching is a Big Deal.

It does not have to be this way.

A Short Tour of 3 tools

bzr: http://bazaar-vcs.org/

hg: http://www.selenic.com/mercurial/

git: http://git.or.cz/

Also, check your distro’s package system.

(At this point in the talk I demonstrated the basic features of each)

What is a DVCS? Why Use a DVCS?

Peer-to-peer design

Everyone gets all the features, rather than the interesting features limited to a high priest class.

Make use of the massive CPU and disk capacity on dev machines.

No central server needed, though many projects nominate a machine for this purpose.

Use a “dumb” storage location for a repository, if desired. Or a “smart server” for performance and security.

Work offline, with full history, branches, merges.

No central administrator is needed, potentially a cost savings.

Very cheap branching, in some cases immediate, even for large projects.

Very good merging, because you merge all the time.

Commit-then-merge, not merge-then-commit.

Repeated merge without havoc.

Merge keeps both sides of history, which is important because you merge a lot. This varies by tools, for example apparently bzr keeps this history less effectively than some others.

Depending on what you are coming from and which tool you choose, the speed gain can be so remarkable that it help every developer every day.

Some SVN Nitpicks

It is easy and tempting to pick on SVN. Linus does so vigorously in his online talk. I don’t hate it as much as he does but, these things bother me:

SVN is slow for large projects.

Branching in SVN sounds clever as you read its cheap copy story. It’s a great story. But actually using it is ridiculous; both in the URL-crazy syntax and utter lack of merge history.

We don’t need a better CVS, we need something much better than CVS.

.svn directories scattered all over are a pointless pain.

.svn directories are enormous, sometimes larger than the entire project history in git or hg!

Security

Security is a weak area in terms of out-of-the-box features and tutorials, because these tools come from the open source world where the default is for everyone to be able to see all code. However, with a little effort you can set up whatever security you like:

If you’re serving over HTTP, you can use Apache mod_whatever to control access.

Tunnel over SSH (in the box, in most cases) to avoid ever sending code in the clear.

Even a “dumb” storage location can be secured with SFTP.

Scripts can be used for per-branch and other fine grained access control, akin to what you can do with svn-access.conf. There are examples online.

I Can’t Use a DVCS Because:

SOX/HIPPA/CMM/etc. requires a centralized tool.

SOX/HIPPA/CMM/etc, is the standard reason why anyone can’t do anything. Some of these tools facilitate much stronger guarantees about the provenance of the source code than you get from a centralized tool, because they have credible and straightforward ways to verify that centralized store has not been compromized.

We are all in one place, therefore a DVCS makes no sense.

Actually many of the features in these tools are as useful in the same building as in a worldwide team.

Tool ZZZ is our corporate standard.

Then you should use it, don’t get fired. However, many people are using a DVCS in front of their corporate standard tool.

DVCS vs DVCS:

Many DVCS tools treat each repo/workspace as a branch and vice versa, so if you use many branches you will have many workspaces.

bzr can use shared files to reduce the bloat from this.

git handles many branches much better, with arbitrarily many per repo/workspace.

My feeling is that hg and git are more mature than bzr.

git does not work well on windows yet.

Other DVCS to Consider

Monotone – has some slick features, but does not appear to scale well to large projects. It stores information in a SQLite DB. Monotone, unlike any others I’ve seen, replicates all branches by default, which is nice.

An aside — there are fascinating thoughts on how to a data synchronization system can work, in the Monotone docs / presentations.

arch / tla — one of the early DVCSs that started all this. I have heard it is mystifying to use.

darcs — everyone loves its “cherry picking”, but I have not triedit.

SVK — optimized for being a better svn client, with offline history and merging that works. Stores merge history in SVN attribute.

More git Merging

Depending on time, I will show more of the branch / merge facilities in git, as well as its gitk GUI.

Other Resources

http://wiki.sourcemage.org/Git_Guide

http://www.adeal.eu/

Update: the following appeared a few days after my talk, it is very good, aside from slightly bogus listings in the “disadvantages” section:
http://betterexplained.com/articles/intro-to-distributed-version-control-illustrated/

Fix It So It Stays Fixed: An Example

A recurring theme in our projects is a desire to “fix things so they stay fixed”. I have in mind writing about that idea in detail later, but for now I’ll start with an example of how to do so.

A common and useful thing to do with disk storage space is to keep old copies of important data around. For example, we might keep the last 15 days of nightly backups of a database. This is easy to set up and helpful to have around. Unfortunately, sooner or later we discover that the process of copying a new backup to a disk managed this way, fails because the disk is full: the ongoing growth of the backup files reached a point where 15 old ones plus a new one does not fit.

How will we fix this?

Idea #1: Reduce 15 days to 10 days. Great, now it doesn’t fail for a while… but eventually it fills up with 10 of the now-larger files. It didn’t stay fixed.

Idea #2: Buy a bigger disk (maybe a huge disk, if money is abundant). A while later, it fills up. It didn’t stay fixed.

Idea #3: Set up an automated monitoring system, so that someone is informed when the disk is getting close to full. This is a big improvement, because hopefully someone will notice the monitor message and adjust it before it fails. But to me, it is not “fixed to stay fixed” because I will have to pay someone to adjust it repeatedly over time.

Idea #4: Sign up for Amazon S3, so we can store an unlimited number of files, of an unlimited size. Thus will probably stay fixed from a technical point of view, but it is highly broken in the sense that you get a larger and larger S3 invoice, growing without limit. To me, this means it didn’t stay fixed.

Idea #5: Dynamically decide how many old backups to keep.

The core problem with the common design I described above is the fixed N of old files to keep. The solution is to make that number dynamic; here is one way to do that:

  • Make the old-file-deletion process  look at the size of the most recent few files, and estimate the “max” of those plus some percentage as the likely maximum size of a new file.
  • Compare that to the free space.
  • If there is not enough free space, delete the oldest backup
  • Loop back and try again.
  • Be careful with error checking, and put in some lower limit of how many files to preserve (perhaps 2 or 3).

Like all mechanisms, this one has limits. Eventually the daily file size may grow so large that it’s no longer possible to keep 1 or more copies on the disk; so in this sense it does not stay fixed; but it does stay fixed all the way up to the limit of the hardware, with no human intervention.

Upcoming talk: Intro to Distributed Source Control

Where: SLUUG (though my talk is not listed on the site yet)

When: October 10th, meeting starts at 6:30 PM

I’ll introduce distributed source control tools:

  • A short tour of the basic use of git, bzr, and hg (Mercurial)
  • Thoughts on why you’d want to use a distributed source control tool at all, vs. a centralized system like SVN or CVS.
  • Some differences between these tools (and a few others), with thoughts on how to choose

In response to a question below about slides… most likely there will be no slides. Rather there will be a handout (which will be posted on my site).

Update: The handout notes are here. Sorry, no audio/video/transcript this time.