Opaque Binary Formats are Terrible

I’m looking at you, Crystal Reports!

Opaque binary file formats for development assets are a scourge on the software development community.

Here’s some context. At Oasis Digital we typically work on complex enterprise data systems. These systems have a long life, with multiple (or many) developers working concurrently and over time. These systems are important to the companies which use them, which is to say, we must not accidentally break things.

In that context, we’ve encountered the following problem on numerous occasions, using numerous tools, on multiple platforms, over many years. Today I’m going to pick on one specific and especially painful instance of it: report definitions.

We believe strongly in code review, where code is defined broadly to include things like the definition of a report. That definition can include SQL, layout information, styling, parameter definitions, and so on. Whenever a developer makes a change to a software system, another developer (often a more senior one) reviews the change to see exactly what is changed, and verify it appears correct, and that no accidental changes are included. In this way, we greatly reduce the incidence of accidentally changing or breaking something that was not supposed to be changed.

Code / change review is very common in most or all mature software development organizations.

For source code, almost universally represented in plain text, the mechanics of code change review are straightforward. The diff command, or any of 1000 tools for comparing files, can readily show the differences between version and an version M. (As an aside, whenever someone presents a mechanism for representing source code is something other than plain text, I dismiss it with a chuckle unless there is also a solid solution provided for the question of how to review changes.)

Happily, many reporting tools represent the layout and query information for a report in a text format. Sometimes it is a proprietary text format, sometimes it is XML; regardless of the ugliness, these all have the merit that it is possible to compare them as text and see what changed between two versions. Offhand, I would like to call out:

  1. Jasper Reports, in the Java world
  2. ReportBuilder, in the Delphi world

for doing the right thing: offering a text-based, diffable representation of a report definition.

Crystal Reports, though, is both a popular and problematic tool. It represents reports in an opaque binary format, for which a “diff” produces nothing readble. With only this in hand, “code review” of report change consists of:

  1. Run the old report
  2. Run the new report
  3. Hold them up next to each other and compare the output
  4. Hope that no accidental changes were made that would break the report for some parameter that wasn’t used for testing

I have heard that “Hope is not a strategy” and that is certainly the case here. Yet this tool (and others like it), an opaque binary representation of report definition leaves us with only hope.

A Way Out

I have read that Crystal Reports offers an API (object model) by which software can inspect and manipulate the report definition. Using this, it should be possible to write a tool which takes a report definition is input, and emits a text file as output, such that the text file contains a human readable definition of every important piece of information about the report definition. this simply adds a minor step to a review/diff process: run both the before and after representations through this conversion, and diff the outputs of the conversion.

To be effective, such a tool would need to produce a text representation which is both complete (which is to say, includes essentially everything that will be needed to re-create a report definition) and stable (which is to say, minor changes to the report definition produced by clicking around, result in minor changes to the text output)

Unfortunately, I have not been able to find such a tool. If anybody knows of one, I would love to hear about it.