Over at the Wall Street Journal and Micro Persuasion and Computers.net and a bunch of other places, a big deal is being made of the YouTube’s estimated 45 Terabytes worth of video. It is “about 5,000 home computers’ worth”. Ouch, 45 Terabytes! Wow!
Or maybe not… consider the mathematics.
45 TB really isn’t all that much data. I’ll assume that each video is stored on 6 hard drives across their systems, for reliability and greater bandwidth, for a total of ~300 TB of hard drives. A 300 GB hard drive costs under $200, and ~1000 will be needed, so this is about $200,000 worth of hard drives, which is not a big deal for a major venture-funded firm like YouTube. (Side note – if you need a lot of read-mostly disk bandwidth, you are much better off buying several cheap hard drives with the same data on them, than one expensive hard drive. It’s not even close.)
The 1000 hard drives might be spread across 250 servers. If their systems is build in the economically smart way (the Google way – lots of commodity servers), each server could cost as little as $3000. Those servers could likely serve the traffic also, as well as (at a lower priority) do any needed video transcoding of newly uploaded video. After all, it is mostly static content, and it’s likely that a small fraction of the videos are a large fraction of the views, so the popular ones stay in RAM cache. Adding other machines for various purposes, network hardware, etc., a YouTube-scale app/storage cluster might cost as little as $2 million, of which a relative small portion (above) is hard drives.
Of course I’ve totally skipped the question of paying for the bandwidth (and power), which must be staggeringly expensive.
5 thoughts on “YouTube’s 45 Terabytes… no big deal?”
not necessarily true, remember that YouTube hosts a service, so you can be pretty damn sure that those drives are AT LEAST 10k rpm, so toss that 200$ / 300GB drive figure out the window, its more likely to be 3x or more than that. per drive.
Aurash commented that they would need high-end drives, much higher cost per drive… but I disagree. As I pointed out above, they are much better off per dollar, with more “normal” drives than with faster drives, for an application like this. HIgh end (fast) drives are for busy DBMSs and the like, not for highly parallelizable static data serving. Of course, the math works out rough the same way if you assume a smaller number of high-end drives, so even if the architect chooses a solution with high-end hard drives, 45 TB is still not all that much data.
I agree with you that 45 TB is not that much data, and I agree with you that you don’t need high-end drives.
YouTube’s videos are low resolution so that they can stream them. One streaming video uses at most 100 kilobytes/sec (~ 768k DSL). I’d guess they use a lot less to save bandwidth. A modern SATA hard drive can probably stream at least 35 megabytes/sec. Why would you need 10k rpm drives for this? Just read the entire 30 second video (3 MB?) into cache in 90ms, then spend 10ms seeking to the next video. You should be able to read 300 30-second clips each 30 seconds, serving 300 users from each drive. If they have 1000 drives, that’s 300,000 simultaneous viewers. It’s likely that videos aren’t evenly distributed though, so popular videos can be cached in RAM and served to lots of users without more disk seeks, and they can probably serve even more users with those disks.
Kyle just a quick question… don’t you think that the 45 TB figure is already taking into account the replicate servers? I’ve seen quotes stating that they host about 6.1 million videos, if you divide the 45 TB by that, it would make up to about 7.4 MB per video. The average video on youtube according to Chad Hurley is 2 1/2 minutes, 7.4 mb seems a bit too much.
That story has moved, here’s the updated location:
Comments are closed.