Kyle Cordes – Page 30

Growing a Language, by Guy Steele

This is an oldie-but-goodie: Guy Steele’s “Growing a Language” talk from OOPSLA 1998.

It is amazing to me that Guy, whose is something of a legend in language design, and who thinks so clearly about what makes a good language, was also key in designing Java. Java has been extremely slow to grow in the sense described in this talk, because for many years Sun resisted such growth. Only the rise of C# and the growing popularity of dynamic languages generated enough pressure to get Java unstuck… and in the last few year Java has become somewhat growable in the sense Guy describes.

A Brief Introduction to Distributed Version Control

Last night at SLUUG, I have a talk on distributed source control tools. It was quite introductory, but the notes (below) may still be helpful. These notes were on a handout at the talk, as usual I didn’t use slides.

Unfortunately I didn’t get an audio recording of this talk, so no transcript either.

About 30 people were in attendance. Nearly 100% were familiar with CVS and SVN, and perhaps 20% with other tools (ClearCase, SourceSafe, and others). Only 4 had ever used branch/merge in any project or tool! Continue reading “A Brief Introduction to Distributed Version Control”

Fix It So It Stays Fixed: An Example

A recurring theme in our projects is a desire to “fix things so they stay fixed”. I have in mind writing about that idea in detail later, but for now I’ll start with an example of how to do so.

A common and useful thing to do with disk storage space is to keep old copies of important data around. For example, we might keep the last 15 days of nightly backups of a database. This is easy to set up and helpful to have around. Unfortunately, sooner or later we discover that the process of copying a new backup to a disk managed this way, fails because the disk is full: the ongoing growth of the backup files reached a point where 15 old ones plus a new one does not fit.

How will we fix this?

Idea #1: Reduce 15 days to 10 days. Great, now it doesn’t fail for a while… but eventually it fills up with 10 of the now-larger files. It didn’t stay fixed.

Idea #2: Buy a bigger disk (maybe a huge disk, if money is abundant). A while later, it fills up. It didn’t stay fixed.

Idea #3: Set up an automated monitoring system, so that someone is informed when the disk is getting close to full. This is a big improvement, because hopefully someone will notice the monitor message and adjust it before it fails. But to me, it is not “fixed to stay fixed” because I will have to pay someone to adjust it repeatedly over time.

Idea #4: Sign up for Amazon S3, so we can store an unlimited number of files, of an unlimited size. Thus will probably stay fixed from a technical point of view, but it is highly broken in the sense that you get a larger and larger S3 invoice, growing without limit. To me, this means it didn’t stay fixed.

Idea #5: Dynamically decide how many old backups to keep.

The core problem with the common design I described above is the fixed N of old files to keep. The solution is to make that number dynamic; here is one way to do that:

Make the old-file-deletion process look at the size of the most recent few files, and estimate the “max” of those plus some percentage as the likely maximum size of a new file.
Compare that to the free space.
If there is not enough free space, delete the oldest backup
Loop back and try again.
Be careful with error checking, and put in some lower limit of how many files to preserve (perhaps 2 or 3).

Like all mechanisms, this one has limits. Eventually the daily file size may grow so large that it’s no longer possible to keep 1 or more copies on the disk; so in this sense it does not stay fixed; but it does stay fixed all the way up to the limit of the hardware, with no human intervention.

Upcoming talk: Intro to Distributed Source Control

Where: SLUUG (though my talk is not listed on the site yet)

When: October 10th, meeting starts at 6:30 PM

I’ll introduce distributed source control tools:

A short tour of the basic use of git, bzr, and hg (Mercurial)
Thoughts on why you’d want to use a distributed source control tool at all, vs. a centralized system like SVN or CVS.
Some differences between these tools (and a few others), with thoughts on how to choose

In response to a question below about slides… most likely there will be no slides. Rather there will be a handout (which will be posted on my site).

Update: The handout notes are here. Sorry, no audio/video/transcript this time.

Python or Python+Delphi Developer Wanted

Speaking of Python, over at oasis Digital we’re looking for a Python (subcontract) or Python+Delphi (full time) developer. For the right person this could be a great opportunity to use your preferred tools.

Plus, a tip to anyone applying for this work or any other work: when you email a resume, don’t name it “my resume.doc” or “resume.pdf”. Rather name it with your name, perhaps “Kyle Cordes resume.doc” or “Resume for Smith, John.pdf”. (I fear that someone will read this and miss the point entirely, naming their resume file with my name…)

Update: We have hired for this work – thank you to everyone who applied.

Yet Another Python Success Story

Is it OK to use programming language X in a production enterprise application? Or are fear, uncertainly, and doubt holding you back? Public “success stories” might make it more acceptable for you to do so in your environment.

In that spirit I offer our story of a production Python deployment at an Oasis Digital customer (without names or details, to protect their privacy). There are many other success stories at Python.org. In this project, the client and application server (the bulk of the system) are written in Delphi (which was much more popular when the project started, than it is today). A major subsystem (roughly 1/3rd of the overall system) is written in Python. It consists of a set modules that parse textual data from a large number of varied formats, into a common schema, another set of modules to apply (frequently adjusted) business rules, and a third set of miscellaneous modules. These are all used in background data processing, not part of a client application or a server handling requests from a client application. These modules interact with the rest of the system primarily through state stored in a database. I generally recommend against the database integration style between separate applications, but it works well in this scenario within modules of the same application, built and maintained concurrently by the same team.

The Good

We chose Python for this subsystem for a variety of reasons. First, its built-in features are well suited to the text processing task at hand. Python’s “batteries included” have generally avoided the need to find or implement add-on text processing tools (which would have been necessary in Delphi); thus a programmer needs to know and use “just” what’s in the Python box, with few external libraries to consider.

Second, Python’s built in features and compact syntax have shortened the programming time considerably, in our estimation, than would have been otherwise. It takes relatively few lines of Python to get the job done. We have many lines of Python, and would have had many more lines had we used a lower-level language. (Of course lines of code is not everything; it’s possible to come up with dense, bad code. As a general rule, though, a language in which you can express what you need to express more succinctly, is better.)

Third, Python’s interpreted nature keeps the edit-test cycles short, further speeding development. This development speed issue is especially important given the niche this project occupies, in which data format changes and rule changes sometimes arrive with no notice: new data arrives, and part of the application does not work until it is enhanced to handle the new data. Extensive use of automated unit and integration tests (many hundreds of test cases) effectively prevents the interpreted operation from causing trivial runtime errors (type errors, syntax errors).

Of course, some other languages with similar compact syntax, included libraries, and high level features, would have worked equally well. At the time(5+ years ago), though, Python appeared to have the most momentum, other than Perl. Perl would have been a good choice also from a technical point of view, but it had an (unwarranted) reputation of being hard to maintain, which I didn’t want to have to overcome.

The Bad

There are downsides to our two-language approach. The first is somewhat Python specific, and really not a big deal: Python is slow. Its primitives are fast, but when you write considerable Python code to do something, it does that something at a rather leisurely pace compared to Delphi or Java or C++, using a lot more CPU along the way. The practical impact of this has been limited, because the bottleneck on this system is not the Python code, it is the database; but still, this has been inconvenient, and has required our customer to deploy multiple machines for this subsystem where one would have been sufficient with a more efficient language/runtime. Doing so is not particularly expensive, though, and adds a measure of reliability, so we haven’t had a need to speed things up with Psyco, C modules, etc.

The other issue is more serious. It is created by the large gap in language style and features between Delphi vs. Python in particular, and low-level vs. scripting languages in general. (Those of you unfamiliar with Delphi may be thinking this is because Delphi is some hideous VB-like toy. Wrong. Delphi is a somewhat C++-like or Java-like language, statically compiled, fast, and sadly burdened with a Pascal syntax.) Personally, this gap bothers me not at all. I’ve written production code in assembly, C, C++, Delphi, Java, Python, Ruby, Javascript, Lua, a bit of Prolog, and others I forget right now; I am happy to use one language in the morning and another radically different one in the afternoon.

I have discovered that most developers are not like me, though. Most Delphi developers are notably uninterested in Python and vice versa. As a result, our project team has ended up divided along the same lines as the software, with some cross training but relatively little production development crossover between the developers working in each languages. This is an obstacle to any developer taking end-to-end responsibility for features or issues that span the languages, and also an obstacle to hiring.

Python itself is not much of an obstacle to hiring: while there are far fewer Python programmers than Java (for example) programmers, there are also far fewer Python jobs than Java jobs.

The Verdict

In spite of the downsides discussed above, overall it has been a “win”, technically, to use two languages (each well suited to part of the application) in this project. More importantly, I am also confident this choice has been a win for our customer: they got a system delivered faster, and at lower cost, than they otherwise would have. They used every bit of speed we could deliver, to win business from their competitors.

However, the world has improved a lot since this decision was made; today I could probably choose a single language / toolset which meets all the needs sufficiently well, and thus avoid the downsides of the two-language solution. Alternatively, if starting today I might build the infrastructure for all subsystems in the same base language, with hooks to use Lua or Javascript scripting to accommodate the need for rapid runtime logic changes. It’s even possible that we would port the existing code to another language in the future – which would not make the original decision a mistake.