Sometimes, Establishing Expertise Doesn’t Pay Off

Recently I analyzed the relative payoff from different types of work I’ve done in my career to date. Some of the work has paid off reasonably well. But one particular bit of it stands out as a counter-example to common wisdom:

Between 1997 and 2000, I spent countless hours on the BDE Alternatives Guide, a section of this web site devoted to listing and analyzing the dozens of third-party database access libraries available for Delphi in that era. Delphi shipped with the BDE a not-great mechanism for database access. BDE was Borland’s answer to Microsoft’s ODBC, but unlike the latter, BDE didn’t get industry-wide support.

Working on the BDE Alternatives Guide had many positive payoffs:

  • It created a much-needed resource, greatly appreciated by thousands of developers.
  • I learned enormously in the process.
  • It put me in touch with dozens of library vendors, and many hundreds of developers.
  • It generated many incoming links and much traffic, around a million page views over a 5-year period.
  • Banner advertisements brought in a few hundred dollars, before I scrapped them to avoid the appearance of bias.
  • It made me reasonably well-known in the Delphi world, which was growing rapidly at that time. (Our team at Oasis Digital still does some Delphi work, by the way.)

You might think, though, that establishing expertise as a Delphi database integration expert, would result in lots of consulting leads, new business, and job offers. Let’s look at the stats:

  • Total number of leads generated: 0
  • Total consulting work generated: $0
  • Total job inquiries and offers: 0

Yes, that’s right. Not a single firm ever contacted me to inquire about consulting, development, etc., as a result of the BAG. I’m still glad I did the work, for all the reasons above. But it is a counter-example to the notion of showing expertise and building a technical following, as a way to generate business interest. Sometimes the work pays off in new business, and sometimes it doesn’t.

The situation is not as dire as it sounds though; concurrently with that technical reputation, I was building a development-results reputation, and the latter was vital to launching Oasis Digital in 2001.

 

Unusual virus which attacks Delphi SysConst.pas / .dcu

There is big news today in the Delphi world: a virus that specifically attacks Delphi installations, inserting a bit of code to embed itself in any application compiled by that Delphi installation. (more info here)

The virus adds code to SysConst.pas, which is then compiled in to SysConst.dcu. (Those are analogous to .c and .o, for those of you outside the Delphi world).

I immediately checked all of my machines (and all Delphi installations on them) and found them all to be clean; but I am obsessively careful about what software I run, and use antivirus software as a second layer of protection on top of that.  All Delphi shops should check for this ASAP.

Is Delphi Dead? No.

A few months ago Alex Miller pointed me to this Delphi doom article (the site appears to be down at the moment), which reminded me to post about the same topic. Here goes.

Delphi shipped in 1995, and its demise has been declared frequently since 1997 or so. In a sense this demise is true, yet also false. Delphi’s current popularity is very different in form (not only in magnitude) from that of Java, C#, etc. Delphi is used substantially by commercial software vendors, and only rarely by enterprises. An ugly reality of the software industry is that the bulk of software developers nationwide work inside large non-software companies, so this usage pattern most likely does not produce the level of unit sales that Codegear (Borland’s dev-tools subsidiary) would like to see. It does, however, produce an enormous number of Delphi application instances running “in the field”, used by real paying end users, who don’t care (or know) what development tools were used to build the software they buy. Many commercial software products, both those in shrinkwrap at retail stores and those for vertical markets, are written in Delphi and will continue to be, because there are very few other good choices for high quality (polished) native Win32 GUI software. In these markets, shipping a Java or .NET app can be a competitive disadvantage (though to a lesser extent over time), and old-style VB is a sad joke.

I don’t think Delphi is eligible for demise until the dominant desktop operating system ships with a dominant runtime platform “in the box”. For example, if all of this happens at the same time:

  • Microsoft ships Windows with the .NET runtime already installed
  • That version Windows is the commonly deployed version
  • That version of the .NET runtime is the commonly targeted version

At that time, the .NET platform (with the language of your choice) could be a compelling replacement for Delphi in its niche. There is a lot to like about .NET (and Java, and I use them both), but I’m not holding my breath for the above conjunction.

Over at Oasis Digital we have several ongoing Delphi projects in which we develop and extend in-house, enterprise applications. These projects feel notably lonely (very few developers here in the midwest use Delphi), and the Delphi language leaves a lot to be desired (such as garbage collection) – but the resulting software works very well for our customers, especially when we add in a bit of Lua or Prolog (story coming someday…).

Delphi is not dead. It’s not at the top of the popularity charts, and won’t be. It probably shouldn’t be your first choice for a new in-house enterprise application starting today, because of the network effects of Java and .NET popularity. But Delphi is not going away anytime soon, and is a great choice for certain classes of projects.

Python or Python+Delphi Developer Wanted

Speaking of Python, over at oasis Digital we’re looking for a Python (subcontract) or Python+Delphi (full time) developer. For the right person this could be a great opportunity to use your preferred tools.

Plus, a tip to anyone applying for this work or any other work: when you email a resume, don’t name it “my resume.doc” or “resume.pdf”. Rather name it with your name, perhaps “Kyle Cordes resume.doc” or “Resume for Smith, John.pdf”. (I fear that someone will read this and miss the point entirely, naming their resume file with my name…)

Update: We have hired for this work – thank you to everyone who applied.

Yet Another Python Success Story

Is it OK to use programming language X in a production enterprise application? Or are fear, uncertainly, and doubt holding you back? Public “success stories” might make it more acceptable for you to do so in your environment.

In that spirit I offer our story of a production Python deployment at an Oasis Digital customer (without names or details, to protect their privacy). There are many other success stories at Python.org. In this project, the client and application server (the bulk of the system) are written in Delphi (which was much more popular when the project started, than it is today). A major subsystem (roughly 1/3rd of the overall system) is written in Python. It consists of a set modules that parse textual data from a large number of varied formats, into a common schema, another set of modules to apply (frequently adjusted) business rules, and a third set of miscellaneous modules. These are all used in background data processing, not part of a client application or a server handling requests from a client application. These modules interact with the rest of the system primarily through state stored in a database. I generally recommend against the database integration style between separate applications, but it works well in this scenario within modules of the same application, built and maintained concurrently by the same team.

The Good

We chose Python for this subsystem for a variety of reasons. First, its built-in features are well suited to the text processing task at hand. Python’s “batteries included” have generally avoided the need to find or implement add-on text processing tools (which would have been necessary in Delphi); thus a programmer needs to know and use “just” what’s in the Python box, with few external libraries to consider.

Second, Python’s built in features and compact syntax have shortened the programming time considerably, in our estimation, than would have been otherwise. It takes relatively few lines of Python to get the job done. We have many lines of Python, and would have had many more lines had we used a lower-level language. (Of course lines of code is not everything; it’s possible to come up with dense, bad code. As a general rule, though, a language in which you can express what you need to express more succinctly, is better.)

Third, Python’s interpreted nature keeps the edit-test cycles short, further speeding development. This development speed issue is especially important given the niche this project occupies, in which data format changes and rule changes sometimes arrive with no notice: new data arrives, and part of the application does not work until it is enhanced to handle the new data. Extensive use of automated unit and integration tests (many hundreds of test cases) effectively prevents the interpreted operation from causing trivial runtime errors (type errors, syntax errors).

Of course, some other languages with similar compact syntax, included libraries, and high level features, would have worked equally well. At the time(5+ years ago), though, Python appeared to have the most momentum, other than Perl. Perl would have been a good choice also from a technical point of view, but it had an (unwarranted) reputation of being hard to maintain, which I didn’t want to have to overcome.

The Bad

There are downsides to our two-language approach. The first is somewhat Python specific, and really not a big deal: Python is slow. Its primitives are fast, but when you write considerable Python code to do something, it does that something at a rather leisurely pace compared to Delphi or Java or C++, using a lot more CPU along the way. The practical impact of this has been limited, because the bottleneck on this system is not the Python code, it is the database; but still, this has been inconvenient, and has required our customer to deploy multiple machines for this subsystem where one would have been sufficient with a more efficient language/runtime. Doing so is not particularly expensive, though, and adds a measure of reliability, so we haven’t had a need to speed things up with Psyco, C modules, etc.

The other issue is more serious. It is created by the large gap in language style and features between Delphi vs. Python in particular, and low-level vs. scripting languages in general. (Those of you unfamiliar with Delphi may be thinking this is because Delphi is some hideous VB-like toy. Wrong. Delphi is a somewhat C++-like or Java-like language, statically compiled, fast, and sadly burdened with a Pascal syntax.) Personally, this gap bothers me not at all. I’ve written production code in assembly, C, C++, Delphi, Java, Python, Ruby, Javascript, Lua, a bit of Prolog, and others I forget right now; I am happy to use one language in the morning and another radically different one in the afternoon.

I have discovered that most developers are not like me, though. Most Delphi developers are notably uninterested in Python and vice versa. As a result, our project team has ended up divided along the same lines as the software, with some cross training but relatively little production development crossover between the developers working in each languages. This is an obstacle to any developer taking end-to-end responsibility for features or issues that span the languages, and also an obstacle to hiring.

Python itself is not much of an obstacle to hiring: while there are far fewer Python programmers than Java (for example) programmers, there are also far fewer Python jobs than Java jobs.

The Verdict

In spite of the downsides discussed above, overall it has been a “win”, technically, to use two languages (each well suited to part of the application) in this project. More importantly, I am also confident this choice has been a win for our customer: they got a system delivered faster, and at lower cost, than they otherwise would have. They used every bit of speed we could deliver, to win business from their competitors.

However, the world has improved a lot since this decision was made; today I could probably choose a single language / toolset which meets all the needs sufficiently well, and thus avoid the downsides of the two-language solution. Alternatively, if starting today I might build the infrastructure for all subsystems in the same base language, with hooks to use Lua or Javascript scripting to accommodate the need for rapid runtime logic changes. It’s even possible that we would port the existing code to another language in the future – which would not make the original decision a mistake.

Help! My Hierarchy is Slow – Faster Hierarchies with Nested Sets

A great many applications, including many that I’ve worked on, have a hierarchy of things: of parts, of people, of organizations, etc. The way most of us represent such hierarchy is with the first thing that generally comes to mind: make each Widget have a parent Widget, with a table like so:

This representation is called an “adjacency list”, and is simple and easy. You can readily build a tool to manipulate a hierarchy stored this way. Many off the shelf visual components, for both client-side and web applications, know how to manipulate hierarchies represented this way. Some reporting tools know how to report on hierarchies represented this way.

However, for answering common questions like “who all is under person X in the hierarchy”, the adjacency list approach is unwieldy and slow.

There are various other approaches to representing a hierarchy, most of them discussed in detail in Joe Celko’s articles and books, prominently in the book Trees and Hierarchies in SQL for Smarties. If you work with SQL and hierarchies, buy this book now.

One approach Celko is especially fond of is the “nested set” representation. You can read about it online here and here.

Of course, changing an entire application to use nested sets might be a very big deal in a mature application. That’s OK; in most cases we can get much of the benefit by building a nested-sets “cache” of the share of the hierarchy, with a table like so:

Each time the hierarchy has changed, or before each time we need to run complex queries, delete the rows in this cache table and repopulate them based on the current canonical adjacency data. Celko offers SQL code in his book to do that, which could be translated to work in the stored procedure language of the DBMS at hand. But what about DBMSs that don’t offer stored procedures, such as lightweight local databases (SQLite), MySQL, etc.? The translation must be done in application code instead.

I wrote such code in Delphi a while back, in the process of getting a full understanding of this problem; I’ve cleaned it up and now offer code for download here (DelphiAdjacencyNestedSets.zip), under an open source license (MIT license – use it all you want). I tested this today with Delphi 2007 Win32, but it should work fine at least back to Delphi 7. As far as I can tell with some searching, this is the only Delphi code for translating adjacency to nested sets available on the internet. This code doesn’t know about databases – it is a module to which you feed adjacency data, and from which you get back nested set data. It includes DUnit test cases.

I’ve also put the code on github, for easy browsing and forking.

(Update in August 2007: a new version (DelphiAdjacencyNestedSets2.zip) optionally tolerates “orphan” nodes and forests. Update in January 2008: a newer version (DelphiAdjacencyNestedSets3.zip) propagates an integer value down the hierarchy; it is named Tag as a nod to the .Tag property on VCL components. Both of these newer versions were tested with Delphi 2006/2007 also.)

The essence of the translation is a depth-first traversal of the hierarchy, and of course this can be easily implemented in other languages; the Delphi code is easy to understand, so don’t be afraid to take a look even if you need some Java or C# or PHP etc. I also stumbled across this PHP nested sets implementation, which offers a set of functions to maintain (insert, update, etc.) a hierarchy stored as nested sets, rather than only translate from adjacency to nested sets.

Another useful way to represent a hierarchy for fast querying is with a transitive closure table. I’ll write this up in a future post; it turns out to be especially useful (and necessary) to make arbitrary hierarchies work in the Mondrian OLAP server.