Why I do not use RAR

We recently adopted a policy (ooh, so official sounding…) at work that we do not use the RAR file format. Oasis Digital being a small firm, the “we” to make this particular decision was just me. Someone quite reasonably asked, why not?

Here is why I don’t use RAR:

  1. The RAR archiver GUI tool is Windows-only, though there are command line tools available for other platforms. The other major choices are fully multi-platform, with command line, GUI, etc. all available. Most of our work is on Windows, but I don’t see a reason to choose a Windows-only tool when others are available.
  2. I get the impression that the vast majority of people who use RAR use the archiver “trialware” permanently without ever paying for it. I earn a living by writing software, so I don’t want to support the notion of using commercial software without paying for it. The same could be said of WinZIP – but I don’t use WinZIP and there are plenty of other ZIP tools.
  3. RAR is a proprietary format, while the other choices (like plain old ZIP) are open and supported out of the box by countless tools, built in to OSs, etc.
  4. RAR’s compression is sometimes a lot better than ZIP; but for the cases where the extra compression matters, something like 7zip’s 7z format offers similar compression but with an open format and free (not trialware) tool. (7zip can also unarchive RAR files, for those cases when a RAR comes in with something I need.)
  5. The world simply does not need another proprietary archive format, so I will my part by not supporting the creation or distribution thereof.

Next Big Language = JavaScript

There’s a lot of buzz about Steve Yegge’s “port” of Rails to JavaScript, and Steve has now provided (in his funny, self-deprecating style) the background of how it came to be. He doesn’t quite say it explicitly in this post, but I think it reveals that the “Next Big Language” he has been hinting at is JavaScript.

I (mostly) agree:

JavaScript is in nearly every browser, including tiny ones (like the one in my BlackBerry Pearl). It may be the single most widely available language today.

Because of the above, an enormous population of JavaScript programmers (though sometimes of dubious skill) has emerged.

Starting with Java 6 it’s “in the box” there also. To me, this makes it the likely winner, by a wide margin, for a dynamic language to be used at Java shops or inside Java projects. Being “in the box” is a powerful advantage, one which the many other contenders will have a hard time overcoming.

Adobe’s new JavaScript virtual machine implementation, which they handed over to Mozilla as “Tamarin”, sounds like it will boost JavaScript performance great, making it good enough for a very wide variety of projects.

JavasScript uses curly braces, like the last few Big Languages.

Like Java, C, C++, etc., JavaScript has specs and multiple competing, complete, current, high quality implementations. This, to me, is a big advantage over Ruby, Python, and other currently popular dynamic languages. Of course there is plenty of room in the industry for these language to thrive also, I am not saying any of them will go away; we use Python with great results and expect to keep doing so.
Mark Volkmann initially thought I was nuts to predict JavaScript as a winner but came around a few month later (and said so in a user group talk).

In a project at work, we’d adopted JavaScript as our plugin extension language for user-customizable rules (billing rules, etc.). I’d have chosen Lua (as I did for another project), but there are at least 1000x as many JavaScript programs out there. So far it has worked very well. If we had it to do over we might implement far more of the project in JavaScript.

However, there are a few reasons why I only “mostly” agree:

First, with JavaScript there isn’t a good way to avoid shipping source code. Sure, you an obfuscate JavaScript with various tools, but the results remains far for amenable to readable-source recovery than in a more traditionally compiled language. For open source projects this is no big deal, but there are also many worthwhile businesses and projects which depend on proprietary, not open software (including most of our projects), and it’s not year clear that obfuscation is sufficient protection. (Update in reply to a comment below: This matters even for server-side software, because some of us create and sell software products for other people to run on their servers.)

Second, at the moment JavaScript appears to lack a module system, without which it’s painful to build large systems. I expect an upcoming language version will address this.

BaseJumpr: BaseCamp -> ActiveCollab

BaseJumpr has a fascinating service offering: they export your data from your Basecamp account, producing a set of files ready to import in to ActiveCollab, the open source Basecamp-sorta-clone-like-program. They then, if you wish to buy their hosting service, create an instance of ActiveCollab for you and import your data there. (They host your file storage on Amazon S3, so they can easily offer ample storage.)

I find this very appealing, yet also a bit impolite; 37Signals has built a good business on Basecamp, the ActiveCollab team has created (well, is creating) an open source clone, while BaseJumpr did neither of these things yet stands to gain (at 37s’s expense). However, I doubt BaseJumpr is a significant threat or bother to 37Signals because most users interested in the open source ActiveCollab would likely not be using the Basecamp service in the first place.

Speaking of Basecamp, I am fascinated by 37Signals’ business success with such a simple (but well executed) application. I tried out Basecamp myself, and found it far too feature-anemic for my taste; but I could readily see its appeal and simplicity, and it has me thinking about the merit of building a business in a focussed niche, intentionally and happily excluding the potential customers outside it.

Update in 2009: BaseJumpr doesn’t appear to exist any more. I am curious how it worked out.

Linus Torvalds explains distributed source control

On several occasions over the last year, I’ve pointed out that distributed source control tools are dramatically better than centralized tools. It’s quite hard for me to explain why. This is probably because of sloppy and incomplete thinking on my part, but it doesn’t help that most of the audiences / people I’ve said this to, have never used a distributed tool. (I’ve been trying out SVK, bzr, git, etc.) Fortunately, I no longer need to stumble with attempts at explaining this; instead, the answer is to watch as:

Linus Torvalds explains distributed source control in general, and git in particular, at Google

Here are some of his points, paraphrased. They might not be clear without watching the video.

  • He hates CVS
  • He hates SVN too, since it’s “CVS done right”, because you can’t get anywhere good from there.
  • If you need a commercial tool, BitKeeper is the one you should use.
  • Distributed source control is much more important than which tool you choose
  • He looked at a lot of alternatives, and immediately tossed anything not distributed, slow, or which does not guarantee that what goes in, comes out [using secure hashes]
  • He liked Monotone, but it was too slow
  • With a distributed tool, no single place is vital to your data
  • Centralized does not scale to Linux-kernel sized projects
  • Distributed tools work offline, with full history
  • Branching is not an esoteric, rare event – everyone branches all the time every time they write a line of code – but most tools don’t understand this, they only understand branching as a Very Big Deal.
  • Distributed tools serve everyone; everyone has “commit” access to their branches. Everyone has all of the tool’s features, rather than only a handful of those with special access.
  • Of course noone else will necessarily adopt your changes; using a tool in which every developer has the full feature set, does not imply anarchy.
  • Merging works like security: as a network of trust
  • Two systems worth looking at: git and Mercurial. The others with good features are too slow [for large projects].
  • Lots of people are using git for their own work (to handle merges, for example), inside companies where the main tools is SVN. git can pull changes from (and thus stay in sync with) SVN.
  • Git is now much easier to use than CVS or other common tools
  • Git makes it easier to merge than other tools. So the problem of doing more merging, is not much of a problem, not something you need to fear.
  • Distributed tools are much faster because they don’t have to go over the network very often. Not talking to a server, is tremendously faster, than talking to even a high end server over a fast network.
  • SVN working directories and repositories are quite large, git equivalents are much smaller
  • The repository (=project) is the unit of checkout / commit / etc.; don’t put them all in to on repository. The separate repositories can share underlying storage.
  • Performance is not secondary. It affect everything you do, it affects how you use an application. Faster operations = merging not a big deal = more, smaller changes.
  • SVN “makes branching really cheap”. Unfortunately, “merging in SVN is a complete disaster”. “It is incredible how stupid these people are”
  • Distributed source control is mostly inherently safer: no single point of failure, secure hashes to protect from even intentional malicious users.
  • “I would never trust Google to maintain my source code for me” – with git (and other distributed systems) the whole history is in many places, nearly impossible to lose it.

My own observations:

There are important differences between source control tools. I have heard it said that these are all “just tools” which don’t matter, you simply use whatever the local management felt like buying. That is wrong: making better tool choices will make your project better (cheaper, faster, more fun, etc.), making worse tool choices will make your project worse (more expensive, slower, painful, higher turnover, etc.)

Distributed tools make the “network of trust” more explicit, and thus easier to reason about.

I think there is an impression out there that distributed tools don’t accomodate the level of control that “enterprise” shops often specify in their processes, over how code is managed. This is a needless worry; it is still quite possible to set up any desired process and controls for the the official repositories. The difference is that you can do that, without the collateral damage of taking away features from individual developers.

Git sounds like the leading choice on Linux, but at the moment it is not well supported on Windows, and I don’t see any signs of that changing anytime soon. I will continue working with bzr, SVK, etc.

There is widespread, but mostly invisible demand for better source control. Over the next few years, the hegemony of the legacy (centralized) design will be lessened greatly. One early adopter is Mozilla, which is switching (or has switched?) to Mercurial. Of course, many projects and companies (especially large companies) will hang on for years to come, because of interia and widespread tool integration.

Several of the distributed tools (SVK, bzr, git, probably more) have the ability to pull changes from other systems, including traditional centralized systems like SVN. These make it possible to use a modern, powerful tool for your own work, even when you are working on a project where someone else has chosen such a traditional system for the master repository.

Update: Mark commented that he doesn’t feel like he has really commited a changeset, until it is on a remote repository. I agree. However, this is trivial: with any of the tools you can push your change from your local repository to a remote one (perhaps one for you alone), with one command. With some of them (bzr, at least) you can configure this to happen automatically, so it is zero extra commands. This does not negate the benefits of a local repository, because you can still work offline, and all read operations still happen locally and far more quickly.

Update: I didn’t mention Mercurial initially, but I should have (thanks, Alex). It is another strong contender, and Mozilla recently chose it. Its feature set is similar to git’s, but with more eager support for Windows.

Update: There is another discussion of Linus’s talk over on Codice Software’s blog, which was linked on Slashdot. Since Codice sells a source control product, the Slashdot coverage is a great piece of free publicity.

Update: Mark Shuttleworth pointed out below that bzr is much faster than it used to be, and that it has top-notch renaming support; which he explains in more detail on in a post. Bzr was already on my own short-list, I am using it for a small project here at Oasis Digital.

Update: I’ve started gathering notes about git, bzr, hg, etc., for several upcoming posts (and possibly a talk or two). If you’re interested in these, subscribe to the feed using the links in the upper right of my home page.

A utility bill worth looking at?

Here in Dardenne Prairie, MO (a suburb of St. Louis; see what WikiPedia has to say about it) our electricity is supplied by the Cuivre River Electric Cooperative. CREC surprised me this month with a genuinely informative addition to the data on the bill: a graph of usage over the last year.

My expectations of utility companies, most of whom operate as monopolies, have not been high. CREC, though, in addition to do a good job supplying electrical power (without the long outages that have plagued the next county over), supplies data which may help me figure out how to consume less of their product!

Of course I already knew we use far more power in the summer than in the winter; but to quantify that amounts I would have needed to sift through a pile of old bills. Seeing this chart, I wonder why our baseline (winter) usage is so high; I’m inspired to get out my Kill-A-WATT and investigate.

Dear PayPal, My Thoughts Do Not Fit In 40 Characters

I recently closed a PayPal account. During the closing process, and again thereafter, I was surveyed as to why I closed the account. Predictably, these surveys offered a few choices for why I didn’t want the account, with only a tiny field if I wanted to explain in more detail why I closed it. I picked the choices that were closest to my actual reasons; but they really were not close at all.

(Kyle’s Business Tip of The Day: You really aren’t helping yourself, or your customers, if you set up feedback mechanisms that only allow you to hear messages you want to hear.)

So here is my reason to be wary of PayPal: I can start with a credit card, but I won’t get far that way; to do anything useful with PayPal, I need to grant them full access to my bank account. At every bank I’ve ever used, there is no mechanism for partial access. Once a vendor is in, there is nothing I can do to stop them from withdrawing all my money. For example, there is no way to say “this vendor can pull out up to $N in a month, and no more”; at most I can get, in some cases, notification of transactions. This is unsafe: fraudulent access to my PayPal account, or a bug at PayPal, could empty my bank account.

There is no “push” way to get money in to PayPal. An example of Push would be that I mail them a check, they deposit it, they wait N days for it to clear, then they put the money in my PayPal account. Push is safe; it avoid granting them “Pull” access to my accounts.

These issues are not solely PayPal’s, rather they are caused by the broken bank payment system here in the U.S.; but it is clearly PayPal’s choice to not provide a workaround in the form of a “push” deposit mechanism.

(European readers will probably find all this silly; apparently in that part of the world there are “push” mechanisms to transfer money electronically, and for all I know PayPal may work like that European customers.)