October 2007 – Kyle Cordes

Great Developers, Projects That Sound Boring

I’ve been a fan of Joel Spolsky for years, though I haven’t agreed with everything he’s written, and even mocked him a bit. Joel has written at length on his web site and in print about attracting the best developers, and one aspect of that has bothered me:

How do you attract top developers to work on something that sounds rather dull, like a bug tracking application? It mostly shuffles data back and forth between screens and reports and database tables – far too boring for top developers.

Of course that’s an exaggeration, but a relevant one: at Oasis Digital much of our work is on enterprise business process automation, database-centric applications, and could likewise be described casually (though not accurately) as “just” shuffling data between screens and tables. I worry that our work will not sound interesting to prospective hires.

This week at the Business of Software conference I got a chance to ask (confront?) Joel about this. He offered a great, four-part answer, which I present here with my own additions mixed in. I don’t have careful notes about which bits came from Joel, so you are welcome to give him all the credit and me all the blame.

1) There is Interesting Technology Inside

Even in an application which, at first glance, just shuffles data around, there can turn out to be a lot of very interesting work inside. This is true of FogBugz and it is true of our work at Oasis Digital as well. Here are some examples of interesting work here, all of it inside dull-sounding applications:

Process metadata and generate code and GUI elements. Top developers certainly are those who solve a family of problems with generic code and metadata, rather than tediously one at a time.
Process large hierarchies efficiently using Celko’s nested sets representation technique. Top developers are all about using better data structures.
Custom GUI components to provide a drag-drop, direct manipulation approach to visualize and modifying data. The results has both a high “wow” factor, and is genuinely useful – a willing combination for top developers.
Integrate a Prolog-based rules mechanism to provide a vital algorithm in one page of code, that would have required countless pages of code and hundreds-to-thousands of hours of work to do otherwise. Using a radically different language to solve a problem with a small fraction of the effort… exactly the sort of thing a top developer wants to do.
Generic data replication mechanism: building our own was certainly more interesting work than adoptions one off the shelf.
Learn how OLAP works, implement an OLAP ETL process.
… I could list many more examples

2) One Level Down the Stack

Fog Creek aims to make compellingly good software because that is how you outcompete established competitors in a commercial shrinkwrap market. Oasis Digital likewise has this aim for a different reason: it is our intended niche. We aim to differentiate ourselves from “yet another outsourced dev company” by building unexpectedly good software. We don’t want customers who will be happy with the results available from the typical development firm; we want customers who are playing to win.

To meet either goal, it is sometimes necessary to work at one level of abstraction lower than would otherwise be necessarily. Joel’s examples were their own data grid and their own AJAX library. Some of our examples are listed above.

This kind of work, further in to the details, is generally more compelling to top developers.

Caveat: Don’t do this very often. If you want to ship software anytime soon, you need to mostly use off the shelf libraries that already work. Don’t build a data-grid, for example, unless you really need something that you can’t find in any off the shelf products. We don’t have our own data-grid; we use (among other things) the excellent grid products from Developer Express.

3) Problem Domain is less important than other factors

Joel has observed that developers aren’t as picky about the problem domain of their project as one might think; rather, other factors are more important: great co-workers, nice working environment, working for a boss who is a developers, etc. Top developers want to work on a high quality end product worthy of taking pride in.

4) All Projects need a lot of Grunt Work

To build a high quality product, in any problem domain, will require spending a great amount of your time on grunt work: tracking down bugs and fixing them, filling in feature “holes”, cleaning up design problems, improving GUI layouts, and the like. This is true for any problem domain. Top developers know this, and just get on with doing that work when needed.

Conclusion

My worry was unjustified. Our work at Oasis Digital is interesting and worthwhile, both for our customers and for our developers. To grow our team, we must focus on making it an increasingly good place to work.

A final anecdote: Joel mentioned that FogBugz 6 was feature-complete in the summer of 2006 – that means they spent around a year polishing it, fixing bugs, filling in holes, etc. That shows a phenomenal amount of dedication and discipline to create quality software.

Distributed Version Control for the Other 80%

Ben Collins-Sussman, one of the key developers behind Subversion, argues in Version Control and the 80% that distributed version control will remain a niche interest, and will not move in to the mainstream (as his favorite tool certainly has). He has a number of good reasons to back up this thesis.

I think he’s wrong. The “other 80%” are not profoundly stupid imbeciles who could never grasp the point of DVCS. Rather they are, generally, working developers with important projects underway, for which they need tools to work well out of the box when used in the default way. DVCS tools can certainly do that. More specifically, the list of reasons he gives why DVCS won’t become broadly popular, should be read more as a to-do list of how to improve DVCS so they can become broadly popular. What the DVCS community needs is at least one DVCS which:

Installs easily on Windows, with a single installer, including diff/merge tool and GUI
Includes a very good standalone GUI
Secures client/server (peer-to-peer) communicate by default, without user setup of SSH, HTTPS, etc.
Integrates well with Eclipse
Integrates well with Visual Studio
Integrates well with Explorer (i.e. TortoiseBlah)
Integrates, begrudgingly, with Microsoft’s SCC API so as to support the many tools which can use an SCC API plugin
Includes permission controls for server repositories, including good tools for configuration thereof
Automates sharing of branches trivially (some already do this, some less so)Automates the common ways of using a DVCS, most importantly the usage model in which the DVCS is used as a better SVN with full offline capabilities
Guides users, if so configured, gently back toward a small number (one, in some cases) of main central branches, which is what most projects want
Communicates clearly what kind of project it can support well (most of them) and what kind it won’t support well (those with an enormous pile of huge files, of which most users only need a few)

(SVN itself is not without flaws. Ben lists some of them as areas in which improvements are coming, while others (such as, in my opinion, using the file namespace for branches and tags) are likely here to stay.)

In the next few years we will probably see one or more DVCS tools gain most or all of the features above. With addresses, an important truth will be more obvious: distributed source control is, in most ways, a superset of centralized source control, and the latter can be thought of as a special case of the former.

That said, though, I think the DVCS movement will lose a bit of steam when SVN ships better merge support, if that merge support is sufficiently good. The merge “features” are certainly the biggest issue we have here with SVN.

Growing a Language, by Guy Steele

This is an oldie-but-goodie: Guy Steele’s “Growing a Language” talk from OOPSLA 1998.

It is amazing to me that Guy, whose is something of a legend in language design, and who thinks so clearly about what makes a good language, was also key in designing Java. Java has been extremely slow to grow in the sense described in this talk, because for many years Sun resisted such growth. Only the rise of C# and the growing popularity of dynamic languages generated enough pressure to get Java unstuck… and in the last few year Java has become somewhat growable in the sense Guy describes.

A Brief Introduction to Distributed Version Control

Last night at SLUUG, I have a talk on distributed source control tools. It was quite introductory, but the notes (below) may still be helpful. These notes were on a handout at the talk, as usual I didn’t use slides.

Unfortunately I didn’t get an audio recording of this talk, so no transcript either.

About 30 people were in attendance. Nearly 100% were familiar with CVS and SVN, and perhaps 20% with other tools (ClearCase, SourceSafe, and others). Only 4 had ever used branch/merge in any project or tool! Continue reading “A Brief Introduction to Distributed Version Control”

Fix It So It Stays Fixed: An Example

A recurring theme in our projects is a desire to “fix things so they stay fixed”. I have in mind writing about that idea in detail later, but for now I’ll start with an example of how to do so.

A common and useful thing to do with disk storage space is to keep old copies of important data around. For example, we might keep the last 15 days of nightly backups of a database. This is easy to set up and helpful to have around. Unfortunately, sooner or later we discover that the process of copying a new backup to a disk managed this way, fails because the disk is full: the ongoing growth of the backup files reached a point where 15 old ones plus a new one does not fit.

How will we fix this?

Idea #1: Reduce 15 days to 10 days. Great, now it doesn’t fail for a while… but eventually it fills up with 10 of the now-larger files. It didn’t stay fixed.

Idea #2: Buy a bigger disk (maybe a huge disk, if money is abundant). A while later, it fills up. It didn’t stay fixed.

Idea #3: Set up an automated monitoring system, so that someone is informed when the disk is getting close to full. This is a big improvement, because hopefully someone will notice the monitor message and adjust it before it fails. But to me, it is not “fixed to stay fixed” because I will have to pay someone to adjust it repeatedly over time.

Idea #4: Sign up for Amazon S3, so we can store an unlimited number of files, of an unlimited size. Thus will probably stay fixed from a technical point of view, but it is highly broken in the sense that you get a larger and larger S3 invoice, growing without limit. To me, this means it didn’t stay fixed.

Idea #5: Dynamically decide how many old backups to keep.

The core problem with the common design I described above is the fixed N of old files to keep. The solution is to make that number dynamic; here is one way to do that:

Make the old-file-deletion process look at the size of the most recent few files, and estimate the “max” of those plus some percentage as the likely maximum size of a new file.
Compare that to the free space.
If there is not enough free space, delete the oldest backup
Loop back and try again.
Be careful with error checking, and put in some lower limit of how many files to preserve (perhaps 2 or 3).

Like all mechanisms, this one has limits. Eventually the daily file size may grow so large that it’s no longer possible to keep 1 or more copies on the disk; so in this sense it does not stay fixed; but it does stay fixed all the way up to the limit of the hardware, with no human intervention.

Upcoming talk: Intro to Distributed Source Control

Where: SLUUG (though my talk is not listed on the site yet)

When: October 10th, meeting starts at 6:30 PM

I’ll introduce distributed source control tools:

A short tour of the basic use of git, bzr, and hg (Mercurial)
Thoughts on why you’d want to use a distributed source control tool at all, vs. a centralized system like SVN or CVS.
Some differences between these tools (and a few others), with thoughts on how to choose

In response to a question below about slides… most likely there will be no slides. Rather there will be a handout (which will be posted on my site).

Update: The handout notes are here. Sorry, no audio/video/transcript this time.