BlackBerry tether with Ubuntu 9.10

There are many pages out there about how to use a tethered BlackBerry internet connection with Ubuntu 9.10. Here is one that actually works. It uses Barry, BlackBerry support software generously provided by Net Direct Inc.

I’ve found this quite useful with an Ubuntu based netbook. There is Wifi a lot of places, but not even close to “everywhere”.

My BlackBerry is on the T-Mobile network, which (nicely) includes tethering at no extra cost, but (not so nicely) offers only EDGE (not 3G) in most of the US. Still, in a pinch an EDGE connection is far better than no connection, and is quite suitable for occasional use at zero incremental cost. For heavy mobile wireless tethering users, I suggest Verizon or Sprint service with their respective USB dongles.

Massive Parallelism and Microslices

I just read James Hamilton’s comments on “Microslice” servers, which are very low-power, but high CPU-to-wattage ratio servers. As he explains in detail, at scale the economics of this design are compelling. In some ways, of course, this is the opposite of another big trend going on, which is consolidation through virtualization. I reconcile these forces like so:

  1. For enterprises with a high ratio of emloyees-per-server-CPU, the cost factors tend to drive cost as a function of the number of boxes / racks /etc. This makes virtualization on to a few big servers a win.
  2. But for enterprises with a low ratio (lots of computing work, small team), the pure economics of the microserver approach makes it the winner.

The microserver approach demands:

  • better automated system adminstration, you must get to essentially zero marginal sysadmin hours per box.
  • better decompisition of the computing work in to parallelizable chunks
  • very low software cost per server (you’re going to run a lot of them), favoring zero-incremental-cost operating systems (Linux)

My advice to companies who make software to harness a cloud of tiny machines: find a way to price it so your customer pays you a similar amount to divide their work among 1000 microservers, as they would amount 250 heavier servers; otherwise if they move to microservers they may find a reason to leave you behind.

On a personal note, I find this broadening trend toward parallelization to be a very good thing – because my firm offers services to help companies through these challenges!

Missing your .svn\tmp directories? One line fix.

You may find with “svn cleanup” (or its TortoiseSVN equivalent) fails with an error message about “system cannot find the path specified”. If you research this error, you may find that the SVN dev team knows that svn-cleanup does not clean up this particular problem, and as of SVN version 1.6.5, considers that OK.

There is an easy fix, though. The tools are already present on nearly any Linux system, and are available in Cygwin or MSYS on Windows. Navigate to the top of your SVN working directory, and run this:

find . -iname '.svn' -exec mkdir {}/tmp \;

If all you were missing was some empty tmp directories, svn cleanup will now work, as will svn update and friends. Of course you may have other, additional problems with your .svn directories.

A mystery, for me and others, is how the missing .svn\tmp directory situation comes about. The best guess I’ve seen, but not yet reproduced here, is that a helpful piece of software (perhaps a backup tool?) deletes empty directories.

The great majority of all software I’ve used, does not depend on empty directories, and I likewise heartily recommend not designing software in such a way that it requires that empty directories are preserved. If you need a directory, please keep something in it. If you don’t need anythign in it, be willing to rereate it when you have something to put in it. Make it Just Work.

Finally, massive storage done right

Last year, I wrote about my efforts to find a storage server with lots of storage at a low cost-per-byte. What was obvious to me at the time, but apparently not obvious to many vendors, is that the key to cost effective storage is to buy mostly hard drives and as little else as possible. I built on Linux and commodity hardware, but the principle applies regardless of OS or hardware vendor.

The team at BackBlaze went much farther down the same path. They ended up with a custom made 4U case (a bit expensive) while the rest of the parts are few in number, inexpensive, and off the shelf. Their cost overhead is stunningly low, as seen in this chart (which I copied from their article):

cost-of-a-petabyte-chart

Is this right for everyone? Of course not. Enterprise buyers, for example, may need the extra functionality offered by the enterprise class solutions (at many times the cost). Cloud providers and web-scale data storage users, though, simply cannot beat BackBlaze’s approach. What about performance? Clearly this low-overhead approach is optimized for size and cost, not performance. Yet the effective performance can be very high, because this approach makes it possible to use a very large number of disk spindles, and thus has a very high aggregate IO capacity.

Predictably, the response to BackBlaze’s design has been notably mixed, with numerous complaint about performance and reliability. For a very thoughtful (though unavoidably biased) response, read this Sun engineer’s thoughts.

The key thing to keep in mind is the problem being solved. BackBlaze’s design is ideal for use as backup, bulk storage. That is a very common need; the solution I set up (described at the link above) had a typical use case of a given file being written once, then never read again, i.e. kept “just in case”. Reliability, likewise, is obtained as the system level, by having multiple independent servers, preferably spread across multiple physical sites. Once you’re paying the complexity cost to achieve this, there isn’t much additional benefit to paying the cost a second time in the form of more expensive storage.

The Right Way to do Monitoring and Mass Administration

Over the weekend I flipped through these slides about Nanite (code), and it got me thinking about system monitoring (again), as well as mass administration tools (Puppet and its younger competitor Chef). The key bit from the talk is the idea of using a proven, off the shelf messaging server (RabbitMQ) as the communication bus among a set of processes running on many servers.

I would like very much to see a piece of software that puts these pieces together:

  1. Monitoring features, like those in Zabbix or other similar tools
  2. Mass administration features, like those in Puppet
  3. Run it over a messaging bus rather than a homegrown communication mechanism

Such a system would allow some very nice improvements:

  • The messaging bus could provide real time “presence” information.
  • Urgent events could be sent immediately, rather than polled.
  • Urgent administration changes could be sent over the same communication channel as normal operations, unlike (for example) the puppetrun mechanism is puppet.
  • The specification for how a server is configured could be integrated in to the specification for how it should be monitored. This would be an enormous improvement over the current state of the art (in open source tools anywhere) where these two concerns are separated in to tools that don’t talk to each other.

In addition to the feature improvements, I suspect that both kinds of tools (monitoring and administration) would find they can get by with a smaller codebase by outsourcing the communication bus to a messaging server.

gitosis on Ubuntu 9.04 Jaunty

As of April 2009, the gitosis package in Ubuntu 9.04 Jaunty is broken; it fails with an error like so:

pkg_resources.DistributionNotFound: gitosis==0.2

There are quite a few pages and mailing list messages that mention this. I only found one with a good hint toward a solution, which was that it is also a known issue on Debian. Following that lead, I got it working by grabbing newer packages from Debian Unstable:

Use this at your own risk; your mileage may vary.