Code Reviews – Use a gate, not a drive-by

Does your team / project use code reviews? If not, I suggest starting today. There are countless resources online about how and why to do code reviews; there are numerous code review tools, open source and commercial.

Who should review code? A great question, perhaps for other blog posts.

When should code be reviewed? Hopefully also great question, and the topic of this blog post.

Context

When making a recommendation, I’m trying to get in the habit of always mentioning the context, the situation in my work from which my recommendation arises. This is in response to seeing frustration when people try to follow advice, not realizing that their situation is completely unlike that of the writer.

Here ate Oasis Digital our context is mostly complex line-of-business applications, which are expected to live for a long time and are important to the business operations of the companies that use them. We are almost always replacing an existing system (which a company relies on) or expanding an existing system (which a company relies on).

Another critical part of our context: we have flexible and fast source control, and a group of developers who have learned how to use it competently. We have decoupled source control from build automation, so that we can perform builds of branches in progress without merging them to a mainline. We can achieve a sufficient degree of continuous integration without actually putting each change directly into the same code line.

Review Code Early

How long after code has been written should it be reviewed? Hours or days, not weeks. For this reason, we commit and push code frequently (to a disposable branch of course) where others can see it for early review. It can be a challenge to get past the ego-driven desire to make code perfect before exposing it to the scrutiny of others, but it is worthwhile. By exposing code to review and comment early in its life, a developer can get useful feedback early in a unit of work.

Review Code Often

Another anti-pattern of code review is to look at a given proposed change only once or twice. That is sensible for a very small change, but for a complex change to a complex system it is best to perform code review repeatedly as work proceeds. For example, consider a feature that takes four weeks to implement. Our typical approach, which I recommend to others (if their context is reasonably similar to ours, see above) is something like this:

  • Review the code a few days in.
  • Review the code progress once a week or so.
  • Review the code when it is nearly ready to be called “done” and to go in the mainline.
  • If there are fixes needed, review those promptly so as not to cause delay.

Review Code Now

Code waiting for review is a drag on the speed of development; the right time to review code is now. A quick review now is probably more valuable than a deep review deferred.

Review Code as a Gate

In some organizations, code review happens asynchronously and sporadically, after code is already part of a project. This risks software quality collapse. Once code is in the mainline of a product, it is arguably too late to review. If a reviewer finds a problem with code already in a system, there will be pressure to sweep that problem under the rug and move on, or to convince your that is not really a problem, or that quality doesn’t really matter, just this once.

The right answer is simple: use code review as a gate through which code must pass before it can become part of the mainline of the project. Use your source control system, and stack up proposed code changes that are nearly ready to go into the mainline. Then periodically perform your review process (whether it is in person, remote, synchronous, asynchronous, etc.). Once the code passes a final review process, then it goes in the mainline. If there are serious problems, the code sits in a branch, a developer improves it, and it tries again next time.

We have had excellent results with this approach, maintaining software quality for years on end even through ongoing development team turnover.

 

Opaque Binary Formats are Terrible

I’m looking at you, Crystal Reports!

Opaque binary file formats for development assets are a scourge on the software development community.

Here’s some context. At Oasis Digital we typically work on complex enterprise data systems. These systems have a long life, with multiple (or many) developers working concurrently and over time. These systems are important to the companies which use them, which is to say, we must not accidentally break things.

In that context, we’ve encountered the following problem on numerous occasions, using numerous tools, on multiple platforms, over many years. Today I’m going to pick on one specific and especially painful instance of it: report definitions.

We believe strongly in code review, where code is defined broadly to include things like the definition of a report. That definition can include SQL, layout information, styling, parameter definitions, and so on. Whenever a developer makes a change to a software system, another developer (often a more senior one) reviews the change to see exactly what is changed, and verify it appears correct, and that no accidental changes are included. In this way, we greatly reduce the incidence of accidentally changing or breaking something that was not supposed to be changed.

Code / change review is very common in most or all mature software development organizations.

For source code, almost universally represented in plain text, the mechanics of code change review are straightforward. The diff command, or any of 1000 tools for comparing files, can readily show the differences between version and an version M. (As an aside, whenever someone presents a mechanism for representing source code is something other than plain text, I dismiss it with a chuckle unless there is also a solid solution provided for the question of how to review changes.)

Happily, many reporting tools represent the layout and query information for a report in a text format. Sometimes it is a proprietary text format, sometimes it is XML; regardless of the ugliness, these all have the merit that it is possible to compare them as text and see what changed between two versions. Offhand, I would like to call out:

  1. Jasper Reports, in the Java world
  2. ReportBuilder, in the Delphi world

for doing the right thing: offering a text-based, diffable representation of a report definition.

Crystal Reports, though, is both a popular and problematic tool. It represents reports in an opaque binary format, for which a “diff” produces nothing readble. With only this in hand, “code review” of report change consists of:

  1. Run the old report
  2. Run the new report
  3. Hold them up next to each other and compare the output
  4. Hope that no accidental changes were made that would break the report for some parameter that wasn’t used for testing

I have heard that “Hope is not a strategy” and that is certainly the case here. Yet this tool (and others like it), an opaque binary representation of report definition leaves us with only hope.

A Way Out

I have read that Crystal Reports offers an API (object model) by which software can inspect and manipulate the report definition. Using this, it should be possible to write a tool which takes a report definition is input, and emits a text file as output, such that the text file contains a human readable definition of every important piece of information about the report definition. this simply adds a minor step to a review/diff process: run both the before and after representations through this conversion, and diff the outputs of the conversion.

To be effective, such a tool would need to produce a text representation which is both complete (which is to say, includes essentially everything that will be needed to re-create a report definition) and stable (which is to say, minor changes to the report definition produced by clicking around, result in minor changes to the text output)

Unfortunately, I have not been able to find such a tool. If anybody knows of one, I would love to hear about it.

What is the Best Git GUI (Client) for Windows?

I adopted Git as my primary source control tool a couple of years ago, when I was using Windows as my primary (90%) desktop OS. Since then I’ve switched to 75% Mac OSX, but I still use Git on Windows for a few projects, and I get a lot of questions about Git on Windows.

I use msysgit (and its included GUI) most often myself, but I don’t have a clear answer as to which is the “best” Git GUI for Windows. I can offer this list of choices, though, along with some thoughts about them.

There is also a very long list of Git tools on the main Git wiki; but that page is just a list, without any other information.

msysgit

msysgit is the main project which ships a Windows port of Git. It is based on MSYS, so it fits in the Windows ecosystem a bit better than the cygwin Git port.

msysgit includes the same Tk-based GUI tools as Git on Linux: a commit tool and a repo-browse tool, plus a bit of shell integration to active the GUI by right-clicking in Windows Explorer, plus a new thing call git-cheetah, which appears to be heading toward Tortoise-style integration. These tools are a bit ugly, but have good and useful functionality. I don’t mind the ugly (I get my fix of stylish software over on my Mac…), and I find the features ample for most work.

If you don’t know where to start, or if you want a Linux-like Git experience, start with msysgit and learn to use its tools.

msysgit is free open source software. It is under active development, and keeps up with the upstream Git versions reasonably well. There is even a portable (zero-install) version available.

My biggest gripe with msysgit (and its GUI) is that I had to figure out how to use it effectively myself. I could have really used a video walkthrough of how to be productive with it, back when I was starting out. That was a long time ago for me, but might be Right Now for people reading this post. Mike Rowe (a reader) helpfully suggested this msysgit tour, which is very helpful though a bit dated.

TortoiseGit

This is an attempt to port TortoiseSVN to git, yielding TortoiseGit. If you like and use TortoiseSVN, you’ll probably find this worth a try. I haven’t tried it yet myself.

TortoiseGit is free open source software, and is under active development.

Git Extensions

This Git GUI has a shell extension (like the Tortoise family) and also a plugin for Visual Studio. From the screen shots, it appears to be feature-rich and complete.

Git Extensions is free open source software, and is under active development.

SmartGit

Unlike the other tools listed here, SmartGit is a commercial product (from a German company), starting at around $70. It appears to be more polished than the others, as is often the case with commercial products. It also appears to be quite feature-rich.

I don’t know how SmartGit fits in with the Git licensing; Git is licensed GPL (v2), so I assume (hope?) SmartGit has found some way to use it under the hood without linking to it in a way that would cause license trouble.

SmartGit requires a Java runtime, implying that it is written in Java. Five year ago I thought of that as a caveat; but today, Java-based GUIs can be extremely attractive and fast, so I don’t see as a problem at all.

Is IDE Integration Vital?

I know people who swear by their IDE experience, and are aghast at the thought of any daily-use dev tool that is not integrated with their IDE. It is almost as though for this group, multitasking does not exist, and any need to run more than one piece of software at the same time is a defect.

Now I love a good IDE as much as anyone (I’ve urged and coached many developers to move from an editor to an IDE), but I don’t agree with the notion that source control must always be in the IDE. IDE-integrated source control can be very useful, but there are sometimes cases where non-integrated source control wins.

The most common example for me is when using Eclipse on a large, complex system. There are two annoyances I see regularly:

  • Eclipse assumes that one Eclipse project is one source control project, an assumption that is sometimes helpful and sometimes painful. In the latter case, simply ditch the Eclipse integration, and use a whole workspace (N projects) as a single source-control project, outside of Eclipse.
  • Sometimes Eclipse source control integration bogs down performance. Turn it off, and things speed up.

Therefore, when I use Eclipse, I sometimes manage the files from outside, using msysgit, command line, etc. When I have a complex “real-life project” comprised of many Eclipse “projects”, I set up a separate Eclipse workspace for it, apart from other unrelated Eclipse projects.

Feedback Wanted

I’d love to hear about:

  • More Windows Git GUIs to list here
  • Anything else I’ve missed

.. via the contact page (link at the top of the page). I try to reply to all email within a few days.

Missing your .svn\tmp directories? One line fix.

You may find with “svn cleanup” (or its TortoiseSVN equivalent) fails with an error message about “system cannot find the path specified”. If you research this error, you may find that the SVN dev team knows that svn-cleanup does not clean up this particular problem, and as of SVN version 1.6.5, considers that OK.

There is an easy fix, though. The tools are already present on nearly any Linux system, and are available in Cygwin or MSYS on Windows. Navigate to the top of your SVN working directory, and run this:

find . -iname '.svn' -exec mkdir {}/tmp \;

If all you were missing was some empty tmp directories, svn cleanup will now work, as will svn update and friends. Of course you may have other, additional problems with your .svn directories.

A mystery, for me and others, is how the missing .svn\tmp directory situation comes about. The best guess I’ve seen, but not yet reproduced here, is that a helpful piece of software (perhaps a backup tool?) deletes empty directories.

The great majority of all software I’ve used, does not depend on empty directories, and I likewise heartily recommend not designing software in such a way that it requires that empty directories are preserved. If you need a directory, please keep something in it. If you don’t need anythign in it, be willing to rereate it when you have something to put in it. Make it Just Work.

gitosis on Ubuntu 9.04 Jaunty

As of April 2009, the gitosis package in Ubuntu 9.04 Jaunty is broken; it fails with an error like so:

pkg_resources.DistributionNotFound: gitosis==0.2

There are quite a few pages and mailing list messages that mention this. I only found one with a good hint toward a solution, which was that it is also a known issue on Debian. Following that lead, I got it working by grabbing newer packages from Debian Unstable:

wget http://ftp.us.debian.org/debian/pool/main/p/python-support/python-support_1.0.2_all.deb
wget http://ftp.us.debian.org/debian/pool/main/g/gitosis/gitosis_0.2+20080825-14_all.deb
sudo dpkg -i python-support_1.0.2_all.deb
sudo dpkg -i gitosis_0.2+20080825-14_all.deb

Use this at your own risk; your mileage may vary.

Webby – Client-side, static content management system

This afternoon I rebuilt OasisDigital.com using Webby, stripping out hand-coded HTML and replacing it with much more maintainable Markdown. The site looks about the same as before (which is to say, mediocre), but under the hood it is much easier to update. We intend to use this new ease, to move forward in improving it. There is a general principle here, which applies broading in software development also:

If you need to make a change, but that change is difficult / tedious / risky to make, first improve the underlying system that makes it so.
(OasisDigital.com is a static web site; we have dynamic contact (issue trackers, etc.) to automate our work together and with our customer, but that content is on another domain.)

Webby is a client-side, simple CMS for generating static web sites, written in Ruby. Why serve a static site (plain old files on a web server) in 2008?

  • It minimizes the moving parts, there is almost nothing to break or maintain.
  • It is very unlikely that any hosting issue will break a static site.
  • It is easy to serve a static site fast (though our current host, TextDrive, sometimes is not all that fast).
  • Security vulnerabilities are very unlikely, in the absence of any executable content.
  • The canonical content (in this case, mostly Markdown) is stored in plain text files, which we track, diff, and merge in git.

In an earlier foray in to Drupal, we found that Drupal has extensive and useful capabilities, as well as a vibrant community, but it also has many moving parts; too many, in my judgment, to make it a good solution for building an essentially static web site.