Opaque Binary Formats are Terrible

I’m looking at you, Crystal Reports!

Opaque binary file formats for development assets are a scourge on the software development community.

Here’s some context. At Oasis Digital we typically work on complex enterprise data systems. These systems have a long life, with multiple (or many) developers working concurrently and over time. These systems are important to the companies which use them, which is to say, we must not accidentally break things.

In that context, we’ve encountered the following problem on numerous occasions, using numerous tools, on multiple platforms, over many years. Today I’m going to pick on one specific and especially painful instance of it: report definitions.

We believe strongly in code review, where code is defined broadly to include things like the definition of a report. That definition can include SQL, layout information, styling, parameter definitions, and so on. Whenever a developer makes a change to a software system, another developer (often a more senior one) reviews the change to see exactly what is changed, and verify it appears correct, and that no accidental changes are included. In this way, we greatly reduce the incidence of accidentally changing or breaking something that was not supposed to be changed.

Code / change review is very common in most or all mature software development organizations.

For source code, almost universally represented in plain text, the mechanics of code change review are straightforward. The diff command, or any of 1000 tools for comparing files, can readily show the differences between version and an version M. (As an aside, whenever someone presents a mechanism for representing source code is something other than plain text, I dismiss it with a chuckle unless there is also a solid solution provided for the question of how to review changes.)

Happily, many reporting tools represent the layout and query information for a report in a text format. Sometimes it is a proprietary text format, sometimes it is XML; regardless of the ugliness, these all have the merit that it is possible to compare them as text and see what changed between two versions. Offhand, I would like to call out:

  1. Jasper Reports, in the Java world
  2. ReportBuilder, in the Delphi world

for doing the right thing: offering a text-based, diffable representation of a report definition.

Crystal Reports, though, is both a popular and problematic tool. It represents reports in an opaque binary format, for which a “diff” produces nothing readble. With only this in hand, “code review” of report change consists of:

  1. Run the old report
  2. Run the new report
  3. Hold them up next to each other and compare the output
  4. Hope that no accidental changes were made that would break the report for some parameter that wasn’t used for testing

I have heard that “Hope is not a strategy” and that is certainly the case here. Yet this tool (and others like it), an opaque binary representation of report definition leaves us with only hope.

A Way Out

I have read that Crystal Reports offers an API (object model) by which software can inspect and manipulate the report definition. Using this, it should be possible to write a tool which takes a report definition is input, and emits a text file as output, such that the text file contains a human readable definition of every important piece of information about the report definition. this simply adds a minor step to a review/diff process: run both the before and after representations through this conversion, and diff the outputs of the conversion.

To be effective, such a tool would need to produce a text representation which is both complete (which is to say, includes essentially everything that will be needed to re-create a report definition) and stable (which is to say, minor changes to the report definition produced by clicking around, result in minor changes to the text output)

Unfortunately, I have not been able to find such a tool. If anybody knows of one, I would love to hear about it.

Micro-review: Bose QuietComfort 15 noise-cancelling headphones

I recently bought these Bose headphones:

qc15_si_lg

Yes, they cost $300. Ouch.

If you travel by airplane more than 1x per year, buy these headphones.

Slightly longer review:

These headphones use active noise cancellation to dramatically slash the volume of loud environments; they work best for continuous, white-noise-like sounds (for example, riding in an airplane). The experience of 4+ hours in the air is completely different when you cut down the noise level. Wearing these, vs. not, is a more dramatic difference in overall unpleasantness, than the difference been first-class and steeragecoach.

Each AAA battery lasts a few flights, sometimes more. You can plug in your media player, computer, etc., or just use the noise cancellation alone.

Orbitz.com Considered Harmful

(Offtopic warning: my site is mostly about technical matters, not about consumer affairs.)

Well, that wasn’t fun.

We had reserved, or so I thought, a hotel stay of a few days, using Orbitz.com.  Life intervened, and it became necessary to cancel.  We attempt to cancel.  It turns out that we hadn’t reserved a hotel stay. We had paid in advance for a hotel stay, which was 100% non-refundable. Don’t stay, still pay the whole amount anyway. (With considerable effort, including intervention by the management of the hotel in question (at which we’ve stayed a number of times before), we were finally able to get it resolved.)

While in a free world one should be able to sell such a toxic product, it generally does not make sense to buy one, certainly not as the default. One lesson to learn: read the terms carefully, there are dragons in there.

But I think that is the wrong lesson. The right lesson is much simpler: do not do business with a vendor (Orbitz) who offers such foolishness. Rather, use them (or any similar size) to find a hotel / flight / whatever, then leave their site and go make the purchase by other means, some means by which the more traditional (and sane) terms-of-sale are used.

Palm Pre First Impressions (vs BlackBerry Pearl)

Today I set aside my BlackBerry Pearl for a shiny new Palm Pre. There are various detailed, photo-rich reviews out there, and many more on the way. I’ll skip that, and pass a few first impressions of the Pre, particularly compared to the BlackBerry (Pearl, in my case).

  • The hardware is quite nice; the size is only a big larger than the Pearl, with a much larger screen. It fits well in the hand. The keyboard is easier to use that the Pearl, having one letter per key. The screen is bright and sharp. The Pre camera is enormously better than the Pearl camera.
  • The Pre is a bit sluggish, even with the 1.0.2 OS update which is said to improve things.
  • The Pre’s software is vastly more advanced than the old BlackBerry Pearl I’ve been using; so much so that it makes the great hardware less responsive than the Pearl’s much older, weaker hardware.
  • The Pre’s gesture recognition seems rather rough to me; compared to an iPhone I found I had to work harder to get it to do the right thing.
  • The Pre’s browser, while reasonably fast and very pretty, has poor usability compared to Opera Mini (which I used as my primarily browser on the Pearl), or even compared to to primitive BlackBerry built in browser. Both of the latter reformat a web page to fit well on a small device, such that I can read most pages without zooming and without horizontal scrolling. On the Pre, reading a typical web page is an exercise in scroll/zoom tedium.
  • The Pre’s email client appears to use IMAP for Gmail access. This works, but not nearly as well as the native Gmail client available for the BlackBerry. It lacks the most common Gmail actions (“Archive” and “Spam”). I don’t know if WebOS makes it possible for Google to create a native Gmail client; if so I hope that happens soon.
  • The most obvious feature in common between the Pre and the BlackBerry is that both support multitasking, unlike the current (as I write this) iPhone. With a couple of button pressed on the BlackBerry, I can flip over to read email while a web page is loading; the same is possible on the Pre (turn on “Advanced” gestures to make it easy).
  • With the Pre, I’ve made several accidental calls so far. I’m not sure it’s a good idea to use softkeys for placing a call; this is the first phone I’ve used (since my first analog cellular telephone in the early 1990s) to not have a physical button to initiate a call.
  • So far, most of the Pre applications leave me wanting more features, more options, more ability to adjust the device to be more functional perhaps at the expense of being less obvious. I expect the situation will improve as WebOS advances, and I hope very much that new versions run on this existing hardware.
  • The main list-of-apps screen on the Pre is almost like that of the iPhone… except that it manages to get the layout not-quite-right in an absurd way.  It arranges the icons 4+ rowshigh, while allowing room for only 3 to be fully visible; thus navigating the list requires both vertical and horizontal scrolling.
  • The App Catalog displeases me greatly, because when it shows apps available as a trial, it does not show the price of the real app. This is perhaps good marketing, but it is also profoundly disrespectful. There are no prices to be seen, and no affirmative indication of free-ness; according go the Palm support page on the topic, you need to nonetheless “know whether the app is free, must be bought, or can be downloaded in a trial version before you buy it.” Many users have been posting “review” which consist of questioning whether the app costs money and how much.
  • At the moment there are a whopping 2 (yes, two) games in the App Catalog, one of which is a trialware Connect Four from EA.
  • The Data Transfer Assistant, used to copy data from the old Palm world, essentially does not work for me. It runs and reports success, but my contacts from Palm Desktop do not appear in the Pre. It sync my Google contact “down” though.
  • The sync mechanism essentially does not work for me.  It claims to be linked to my Google Calendar, but events do not sync in either direction. I suspect it is silently failing under the hood, but in order to preserve the beauty of the GUI, hides the errors.

As you can see I am not entirely happy with the Pre. Perhaps it will grow on me over the next couple of weeks, though that seems possible only if a very substantial bug fix software release appears in that time.

Updates, over the next week:

  • I manually cleaned up my old Palm Desktop data, then manually renamed a file in the Palm Desktop data store, and was able to get the Data Transfer Assistant to work for Contacts. Next, I purged all old events and some new events in my old Palm Desktop data, then got that data in place (with considerable “tapping” on each event, one at a time!) in to the Pre and from there to Google Calendar. After considerable gyration, the over-the-air sync works well.
  • Prominent multitasking is a very good thing. BlackBerry has had multitasking for years, but I suspect many BlackBerry users never use it.
  • After many tries, the Pre WebOS 1.0.2 update installed, and is indeed a bit less sluggish.
  • I can appreciate more with time, how nice the browser looks; but it is still a big step backward for effective use on a small screen, compared to Opera Mini. The Pre is so tedious that I find myself browsing less than before, even though I just got a slick new device.
  • While it’s tedious to read many sites with the main browser, the site-specific Apps for the New York Times and AP wrap text correctly and are easy to read.
  • The Pre’s interface for making phone calls is disappointing.
  • I greatly miss the Gmail email client that I used on the BlackBerry. The Pre’s email lacks very basic capabilities, for example there is no way to delete/archive/file a message without opening it. With Gmail on a 2-year-old BlackBerry it is feasible to handle dozen of messages (read some, don’t read others, archive some off, mark a couple as spam, etc.) in a minute or two, with one hand and typically one keypress per message. With the Pre the same work takes two hands and many minutes, mostly because of the load time to open each message followed by multiple taps to process it.
  • The Alarm Clock puts a notification icon at the bottom of the screen permanently, for no apparent reason. Hopefully an upcoming update will make it possible to use the alarm without permentantly losing a strip of screen space.
  • Battery life on the Pre is short. Starting in the morning at 100%, after a long day of sporadic use it is down to 20%.
  • Battery life on the Pre is really short: fully changed at midnight.  7 AM, down to 82%. Less than an hour of casual use, down to 52%.
  • I am not at all convinced that touch screens are well suited for cell phones. I’ve found it much more difficult to place calls, answers calls, and avoid accidental operations.

My comments above are surely slanted toward the negative; but as a lifelong “early adopter” my patience is considerable. Perhaps I’ll grab the SDK and work on an App or two myself. At least a portion of the Pre’s weaknesses could be effectively addressed by high quality third party apps.

Update, after a few more days:

  • I gave up, and returned the Pre. I’ve carried a cell phone since ~1991, upgrading every couples of years; this is the first time I’ve ever returned one.
  • It’s not you, it’s me. The Pre is a great piece of equipment, for its target audience.
  • I got a BlackBerry Curve 8900 instead; it is a much more advanced successor of the Pearl.
  • Compared to the Pre, the Curve has perhaps 100x more adjustments possible, to make it do what I want. It multitasks. It has a physical keyboard of high quality. Its battery lasts a long time. It runs its own (weakish) browser, and also runs Opera Mini. Thanks to Google’s sync tools, it offers a synced calendar and contact list.
  • The Curve GUI is not as pretty as the Pre; but it is of very high usability, even one-handed.
  • T-Mobile pricing is similar to Sprint, but T-Mobile includes tethering at no extra cost.

Network / System Monitoring Smorgasbord

At one of my firms (a Software as a Service provider), we have a Zabbix installation in place to monitor our piles of mostly Linux servers. Recently we look a closer look at it and and found ample opportunities to monitor more aspects, of more machines and device, more thoroughly. The prospect of increased investment in monitoring led me to look around at the various tools available.

The striking thing about network monitoring tools is that there are so many from which to choose. Wikipedia offers a good list, and the comments on a Rich Lafferty blog post include a short introduction from several of the players. (Update – Jane Curry offers a long and detailed analysis of network / system monitoring and some of these tools (PDF).)

For OS level monitoring (CPU load, disk wait time, # of processes waiting for disk, etc.), Linux exposes extensive information with “top”, “vmstat”, “iostat”, etc. I was disappointed to not find any of these tools conveniently presenting / aggregating / graphing the data therein. From my short look, some of the tools offer small subsets of that data; for details, they offer the ability for me to go in and figure out myself what data I want in and how to get it. Thanks.

Network monitoring is a strange marketplace; many of the players have a very similar open source business model, something close to this:

  • core app is open source
  • low tier commercial offering with just a few closed source addons, and support
  • high tier commercial offering with more closed source addons, and more support

I wonder if any of them are making any money.

Some of these tools are agent-based, others are agent-less. I have not worked with network monitoring in enough depth to offer an informed opinion on which design is better; however, I have worked with network equipment enough to know that it’s silly not to leverage SNMP.
I spent yesterday looking around at some of the products on the Wikipedia list, in varying levels of depth. Here I offer first impressions and comments; please don’t expect this to be comprehensive, nor in any particular order.

Zabbix

Our old installation is Zabbix 1.4; I test-drove Zabbix 1.6 (advertised on the Zabbix site as “New look, New touch, New features”. The look seemed very similar to 1.4, but the new feature list is nice.

We most run Ubuntu 8.04, which offers a package for Zabbix 1.4. Happily, 8.04 packages for Zabbix 1.6 are available at http://oss.travelping.com/trac.

The Zabbix agent is delightfully small and lightweight, easily installing with a Ubuntu package. In its one configuration file, you can tell it how to retrieve additional kinds of data. It also offers a “sender”, a very small executable that transmits a piece of application-provided data to your Zabbix server.

I am reasonably happy with Zabbix’s capabilities, but I have the GUI design to be pretty weak, with lots of clicking to get through each bit of configuration. I built far better GUIs in the mid-90s with far inferior tools to what we have today.  Don’t take this as an attack on Zabbix in particular though; I have the same complaint about most of the other tools here.

We run PostgreSQL; Zabbix doesn’t offer any PG monitoring in the box, but I was able to follow the tips at http://www.zabbix.com/wiki/doku.php?id=howto:postgresql and get it running. This monitoring described there is quite high-level and unimpressive, though.

Hyperic

I was favorably impressed by the Hyperic server installation, which got two very important things right:

  1. It included its own PostgreSQL 8.2, in its own directory, which it used in a way that did not interfere with my existing PG on the machine.
  2. It needed a setting changed (shmmax), which can only be adjusted by root. Most companies faced with this need would simply insist the installer run as root. Hyperic instead emitted a short script file to make the change, and asked me to run that script as root. This greatly increased my inclination to trust Hyperic.

Compared to Zabbix, the Hyperic agent is very large: a 50 MB tar file, which expands out to 100 MB and includes a JRE. Hyperic’s web site says “The agent’s implementation is designed to have a compact memory and CPU utilization footprint”, a description so silly that it undoes the trust built up above. It would be more honest and useful of them to describe their agent as very featureful and therefore relatively large, while providing some statistics to (hopefully) show that even its largish footprint is not significant on most modern servers.

Setting all that aside, I found Hyperic effective out-of-the-box, with useful auto-discovery of services (such as specific disk volumes and software packages) worth monitoring, it is far ahead of Zabbix in this regard.

For PostgreSQL, Hyperic shows limited data. It offers table and index level data for PG up through 8.3, though I was unable to get this to work, and had to rely on the documentation instead for evaluation. This is more impressive at first glance than what Zabbix offers, but is still nowhere near sufficiently good for a substantial production database system.

Ganglia

Unlike the other tools here, Ganglia comes from the world of high-performance cluster computing. It is nonetheless apparently quite suitable nowadays for typical pile of servers. Ganglia aims to efficiently gather extensive, high-rate data from many PCs, using efficient on-the-wire data representation (XDR) and networking (UDP, including multicast). While the other tools typically gather data at increments of once per minute, per 5 minutes, per 10 minutes, Ganglia is comfortable gathering many data points, for many servers, every second.

The Ganglia packages available in Ubuntu 8.04 are quite obsolete, but there are useful instructions here to help with a manual install.

Nagios

I used Nagios briefly a long time ago, but I wasn’t involved in the configuration. As I read about all these tools, I see many comments about the complexity of configuring Nagios, and I get the general impression that it is drifting in to history. However, I also get the impression that its community is vast, with Nagios-compatible data gathering tools for any imaginable purpose.

Others

Zenoss

Groundwork

Munin

Cacti

How Many Monitoring Systems Does One Company Need?

It is tempting to use more than one monitoring system, to quickly get the logical union of their good features. I don’t recommend this, though; it takes a lot of work and discipline to set up and operate a monitoring system well, and dividing your energy across more than one system will likely lead to poor use of all of them.

On the contrary, there is enormous benefit to integrated, comprehensive monitoring, so much so that it makes sense to me to replace application-specific monitors with data feeds in to an integrated system. For example, in our project we might discard some code that populates RRD files with history information and published graphs, and instead feed this data in to a central monitoring system, using its off-the-shelf features for storage and graphing.

A flip side of the above is that as far as I can tell, none of these systems offers detailed DBA-grade database performance monitoring. For our PostgreSQL systems, something like pgFouine is worth a look.

Conclusion

I plan to keep looking and learning, especially about Zenoss and Ganglia. For the moment though, our existing Zabbix, upgraded to the current version, seems like a reasonable choice.

Comments are welcome, in particular from anyone who can offer comparative information based on substantial experience with more than one of these tools.

Dreamhost Out, TextDrive In

You might have noticed that this site is much faster than it used to be. The reason? I moved it from DreamHost to TextDrive.

TextDrive costs more, its “control panel” is not as good as DreamHost’s, and its bandwidth/storage limits are lower. But my site is far faster, hasn’t had any downtime or email downtime since the switch (during which DreamHost had an email outage), and TextDrive support responds much sooner.

I have a few TextDrive nitpicks though: there is no built in web-stats system (I’ll need to install one), and they apparently don’t have a backup system working at the moment (!). I’ve set up a nightly rsync to a machine here for backup purposes, but I sure hope they don’t intend this as a long term situation.

Update: Jason at Joyent/Textdrive noticed this post, and added a comment that the backup problem is long fixed.

Update: A complaint without data risks sounding like a whine. So I’ll add some data. Today I noticed that sites I still have on DreamHost are slow. Why? let’s look:

$ date
Fri Sep 8 15:56:14 PDT 2006
$ uptime
15:56:18 up 5:31, 3 users, load average: 103.41, 95.54, 181.86

Update: Some months later, TextDrive has turned out to have approximately at much, or more, downtime as DreamHost. It’s still fast when it’s up, and the TextDrive guys are helpful, friendly, and responsive. But the shared hosting they offer has frequent downtime.