YouTube’s 45 Terabytes… no big deal?

Over at the Wall Street Journal and Micro Persuasion and Computers.net and a bunch of other places, a big deal is being made of the YouTube’s estimated 45 Terabytes worth of video. It is “about 5,000 home computers’ worth”. Ouch, 45 Terabytes! Wow!

Or maybe not… consider the mathematics.

45 TB really isn’t all that much data. I’ll assume that each video is stored on 6 hard drives across their systems, for reliability and greater bandwidth, for a total of ~300 TB of hard drives. A 300 GB hard drive costs under $200, and ~1000 will be needed, so this is about $200,000 worth of hard drives, which is not a big deal for a major venture-funded firm like YouTube. (Side note – if you need a lot of read-mostly disk bandwidth, you are much better off buying several cheap hard drives with the same data on them, than one expensive hard drive. It’s not even close.)

The 1000 hard drives might be spread across 250 servers. If their systems is build in the economically smart way (the Google way – lots of commodity servers), each server could cost as little as $3000. Those servers could likely serve the traffic also, as well as (at a lower priority) do any needed video transcoding of newly uploaded video. After all, it is mostly static content, and it’s likely that a small fraction of the videos are a large fraction of the views, so the popular ones stay in RAM cache. Adding other machines for various purposes, network hardware, etc., a YouTube-scale app/storage cluster might cost as little as $2 million, of which a relative small portion (above) is hard drives.

Of course I’ve totally skipped the question of paying for the bandwidth (and power), which must be staggeringly expensive.

Keep Your Development Focus Sharp

Brian Button recently suggested, for XPSTL, “a series of presentations where we discuss the biggest challenges we, as team members and developers for the most part, face in our day-to-day jobs with respect to being agile. The challenges can be in technical areas, organizational change issues, or whatever else people think is hard.”

I’ll bite.

First, some background: the context here is an agile but not XP project, with a widely distributed team. The software is sold in an “application service provider” model; we usually deploy updated software weekly, though our planning is not particularly crisp around iteration boundaries. We use source control, a build server, an issue tracking system (paper cards aren’t of much use in a distributed team), unit tests, a few acceptance tests, a mailing list, Campfire, Java, and Eclipse. We build and sell an “enterprise” software product, so our product management is driven both by meeting current customer-specific needs and general target market needs.

We have listed a great number of desired features (and tasks, and bug fixes), and grouped them in to a few large buckets, which are roughly “releases” in the XP sense of the word. A small subset of those work items are considered “To Do” items, i.e. items that should be worked on now, to be deployed it the next week or two.

The ongoing challenge we face is in disciplining our own desires – in limiting the list of what we will do next, to what we can reasonably expect to accomplish soon, and then to work vigorously on getting those things out the door. Thus our goal is to, at all times, keep the team focused on a relative handful (at most a couple dozen) of key things to work on now, to bring those items to end-to-end closure.

Our challenge is in the perpetual tendency to derail the process, in any number of ways:

  • As new issues come up, there is a tendency to throw these in the “do now” list, without regard to their importance in the backlog.
  • Items get stuck waiting for requirements feedback. Either get the feedback quickly, or kick the items off your “to do” list in to the backlog, and go work on something you can finish.
  • Items get stuck waiting for integration in to core code.Items get stuck waiting for a production / deployment task, because your team is hesitant to touch a production system. The solution is to test, test, and test again, then deploy (using whatever the local process is, for your team / project / organization).
  • Items grow in scope, as they are discussed and implemented. The fix for this is to split off the additional scope in to another item (XP card) and put it in the backlog to be prioritized; trim the work item in front of you to a reasonable size and get it shipped.

All of these cause an overall derailment not because they add “bad” items, but because they reduce the focus, and thus reduce the tendency to get items finished and out the door. My advice is to keep your (team’s) focus sharp.

DreamHost Disasters

You might have noticed that kylecordes.com and my other sites have had several long downtimes recently.  I certainly have. The cause, in every case, has been DreamHost, the formerly excellent hostiing firm I use.  DreamHost has excellent features and configurability, but two large problems:

  1. Lots of downtime incidents
  2. Slow performance of DB-based web apps, including WordPress

I’ve started the process of moving a site off of DreamHost, so that at worst I won’t have everything down at once.  After some live evaluation, I’ll decide whether to move the rest.

I am much impressed, though, by a long explatation of the recent problems over on the Dreamhost Blog:
DreamHost Blog » Anatomy of a(n ongoing) Disaster..

This contains far more detail and honesty than I am accustomed to receiving from vendors, and I appreciate it very much.  I still recommend DreamHost, if you are looking for a bargain, rich features, and can tolerate (until they demonstrate otherwise) some “issues”.

Walk a Mile in Their Shoes

Armin Vit, a graphic designer, described his experiences hiring out work on his home, and how he had opportunities to offer the same kinds of objection that his clients have offered: “We, of course, hated these clients too. Mocking us with their DIY, cheapskate attitude. Hated them. All of them. … Fast-forward to September 2005 and Bryony and I have become these clients. All of them.”

I can echo the experience of becoming a client. Over the last few years, we at Oasis Digital have hired out various chunks of work to subcontractors: local subcontractors, those around the country, and in a few cases, those around the world (including a few on rentacoder.com). This has been an enormously educational and enlightening experience – I personally have a much greater understanding and appreciation for what it’s like to be a client. It’s not as easy as it looks, so to speak. I believe these experience will greatly help me understand how to deliver better for our clients.

It is an experience I recommend heartily – if your main work is as a provider of services (technical or otherwise), get some expering in hiring the same kinds of services you normally provide, even if you have to make up a “fake” project. This experience will be worth far more than what it costs.

Second Life – Wow

At ETech I saw a talk from the Linden Labs guys… and didn’t really understand.  Today I came across this video of a talk they gave at Google…  and now I understand.  I don’t have any particular desire to play myself; but the community and marketplace they have is remarkable.  Unlike the various “MOMRPGs”, these guys don’t charge a subscription fee; they charge for virtual real estate instead.  Moreover, in this marketplace of people selling things to each other, there are $5 million of transactions every month (!) between Second Life users, with thousands of such users building and selling things as their full-time, pay-the-rent work.

I don’t know what this means.  If someone has proposed this idea to me I’d had said it would go nowhere… obviously my understanding of what people will buy, needs considerable adjustment.