YouTube’s 45 Terabytes… no big deal?

Over at the Wall Street Journal and Micro Persuasion and and a bunch of other places, a big deal is being made of the YouTube’s estimated 45 Terabytes worth of video. It is “about 5,000 home computers’ worth”. Ouch, 45 Terabytes! Wow!

Or maybe not… consider the mathematics.

45 TB really isn’t all that much data. I’ll assume that each video is stored on 6 hard drives across their systems, for reliability and greater bandwidth, for a total of ~300 TB of hard drives. A 300 GB hard drive costs under $200, and ~1000 will be needed, so this is about $200,000 worth of hard drives, which is not a big deal for a major venture-funded firm like YouTube. (Side note – if you need a lot of read-mostly disk bandwidth, you are much better off buying several cheap hard drives with the same data on them, than one expensive hard drive. It’s not even close.)

The 1000 hard drives might be spread across 250 servers. If their systems is build in the economically smart way (the Google way – lots of commodity servers), each server could cost as little as $3000. Those servers could likely serve the traffic also, as well as (at a lower priority) do any needed video transcoding of newly uploaded video. After all, it is mostly static content, and it’s likely that a small fraction of the videos are a large fraction of the views, so the popular ones stay in RAM cache. Adding other machines for various purposes, network hardware, etc., a YouTube-scale app/storage cluster might cost as little as $2 million, of which a relative small portion (above) is hard drives.

Of course I’ve totally skipped the question of paying for the bandwidth (and power), which must be staggeringly expensive.

Keep Your Development Focus Sharp

Brian Button recently suggested, for XPSTL, “a series of presentations where we discuss the biggest challenges we, as team members and developers for the most part, face in our day-to-day jobs with respect to being agile. The challenges can be in technical areas, organizational change issues, or whatever else people think is hard.”

I’ll bite.

First, some background: the context here is an agile but not XP project, with a widely distributed team. The software is sold in an “application service provider” model; we usually deploy updated software weekly, though our planning is not particularly crisp around iteration boundaries. We use source control, a build server, an issue tracking system (paper cards aren’t of much use in a distributed team), unit tests, a few acceptance tests, a mailing list, Campfire, Java, and Eclipse. We build and sell an “enterprise” software product, so our product management is driven both by meeting current customer-specific needs and general target market needs.

We have listed a great number of desired features (and tasks, and bug fixes), and grouped them in to a few large buckets, which are roughly “releases” in the XP sense of the word. A small subset of those work items are considered “To Do” items, i.e. items that should be worked on now, to be deployed it the next week or two.

The ongoing challenge we face is in disciplining our own desires – in limiting the list of what we will do next, to what we can reasonably expect to accomplish soon, and then to work vigorously on getting those things out the door. Thus our goal is to, at all times, keep the team focused on a relative handful (at most a couple dozen) of key things to work on now, to bring those items to end-to-end closure.

Our challenge is in the perpetual tendency to derail the process, in any number of ways:

  • As new issues come up, there is a tendency to throw these in the “do now” list, without regard to their importance in the backlog.
  • Items get stuck waiting for requirements feedback. Either get the feedback quickly, or kick the items off your “to do” list in to the backlog, and go work on something you can finish.
  • Items get stuck waiting for integration in to core code.Items get stuck waiting for a production / deployment task, because your team is hesitant to touch a production system. The solution is to test, test, and test again, then deploy (using whatever the local process is, for your team / project / organization).
  • Items grow in scope, as they are discussed and implemented. The fix for this is to split off the additional scope in to another item (XP card) and put it in the backlog to be prioritized; trim the work item in front of you to a reasonable size and get it shipped.

All of these cause an overall derailment not because they add “bad” items, but because they reduce the focus, and thus reduce the tendency to get items finished and out the door. My advice is to keep your (team’s) focus sharp.

Aiming for Mainstream

Over on defmacro today, a new article appeared: defmacro – Why Exotic Languages Are Not Mainstream in which the author laments that while there appear to be various choices to use Haskell on Windows, it turns out that all of them are, in some way, not ready for prime time… or even for effective hobbiest use.

I’ve noticed this myself, in my last few forays in to esoteric languages: the illusion of plenty of choices, runs in the the reality of no good choices.  This is not a universal problem; I’ve had great results with Ruby, Python, and Lua, all of which are to some extent esoteric.  The thing that those languages have in common is that there is at least one (and generally, only one) robust, production grade implementation with a community actively supporting it.

If you want to see your favorite language gain acceptance, spend your time creating / maintaining / vigorously supporting a production-ready implementation.

DreamHost Disasters

You might have noticed that and my other sites have had several long downtimes recently.  I certainly have. The cause, in every case, has been DreamHost, the formerly excellent hostiing firm I use.  DreamHost has excellent features and configurability, but two large problems:

  1. Lots of downtime incidents
  2. Slow performance of DB-based web apps, including WordPress

I’ve started the process of moving a site off of DreamHost, so that at worst I won’t have everything down at once.  After some live evaluation, I’ll decide whether to move the rest.

I am much impressed, though, by a long explatation of the recent problems over on the Dreamhost Blog:
DreamHost Blog » Anatomy of a(n ongoing) Disaster..

This contains far more detail and honesty than I am accustomed to receiving from vendors, and I appreciate it very much.  I still recommend DreamHost, if you are looking for a bargain, rich features, and can tolerate (until they demonstrate otherwise) some “issues”.