I recently needed a file server with ample capacity, on which to store data backups. The typical data access pattern for backups is that data is written once and read rarely, so bulk backup storage has much lower performance requirements than, for example, disks used for database files.
I need to store a great number of files, and I had an old server to recycle, so I ended up with:
- 4U ASUS case with room for many internal drives
- Qty1, leftover 320GB drive (to boot from)
- Qty 10, 1 TB drives for data storage: WD Caviar Green
- An extra SATA controller (plus the 8 SATA ports on the motherboard)
- Ubuntu Linux 8.04.1 LTS
- Software RAID6
The separate boot drive is for simplicity; it contains a trivial, vanilla Ubuntu install; if availability mattered more I could replace it with a RAID1 pair, or flash storage – even a cheap USB “key drive” would be sufficient, if I went to the trouble of setting up /var and /tmp to not write to it (thus avoid premature wearout).
The terabyte drives have one large RAID container partition each (quick work with sfdisk). The 10 of them in a RAID6 yield 8 drives worth of capacity. Adjusting also for the difference between marketing TB/GB and the real thing, plus a bit of filesystem overhead, I ended up with 7.3 TB of available storage. Here it is, with some data already loaded:
Filesystem Size Used Avail Use% Mounted on /dev/sde1 285G 1.3G 269G 1% / varrun 2.0G 164K 2.0G 1% /var/run varlock 2.0G 0 2.0G 0% /var/lock udev 2.0G 112K 2.0G 1% /dev devshm 2.0G 0 2.0G 0% /dev/shm /dev/md0 7.3T 3.1T 4.3T 42% /raid
I went with software RAID for simplicity, low cost, and easy management:
# cat /proc/mdstat [....] md0 : active raid6 sda1[0] sdk1[9] sdj1[8] sdi1[7] sdh1[6] sdg1[5] sdf1[4] sdd1[3] sdc1[2] sdb1[1] 7814079488 blocks level 6, 64k chunk, algorithm 2 [10/10] [UUUUUUUUUU]
I chose RAID6 over RAID5 because:
- This array is so large, and a rebuild takes so long, that the risk of a second drive failing before a first failure is replaced and rebuilt seems high.
- 8 drives of capacity for 10 drives is a decent value. 5/10 (with RAID10) is not.
It turns out that certain default settings in Linux software RAID can yield instability under RAM pressure in very large arrays; after some online research I made the adjustments below and it appears solid. The sync_speed_max setting throttles back the RAID rebuild, helpful because I was able to start populating the storage during the very long rebuild process.
vi /etc/rc.local echo 30000 >/proc/sys/vm/min_free_kbytes echo 8192 >/sys/block/md0/md/stripe_cache_size echo 10000 >/sys/block/md0/md/sync_speed_max vi /etc/sysctl.conf vm.vfs_cache_pressure=200 vm.dirty_expire_centisecs = 1000 vm.dirty_writeback_centisecs = 100
7.3T is far beyond the size limit for ext2/3, so I went with XFS. XFS appears to deal well with the large size without any particular tuning, but increasing the read-ahead helps with my particular access pattern (mostly sequential), also in rc.local:
blockdev --setra 8192 /dev/sda blockdev --setra 8192 /dev/sdb blockdev --setra 8192 /dev/sdc blockdev --setra 8192 /dev/sdd blockdev --setra 8192 /dev/sde blockdev --setra 8192 /dev/sdf blockdev --setra 8192 /dev/sdg blockdev --setra 8192 /dev/sdh blockdev --setra 8192 /dev/sdi blockdev --setra 8192 /dev/sdj blockdev --setra 8192 /dev/sdk
I was happy to find that XFS has a good “defrag” capability; simply install the xfsdump toolset (apt-get install xfsdump) then schedule xfs_fsr to run daily in cron.
Power consumption seems reasonable at 3.0 amps under load.
Marvell: Not Marvellous
In this machine I happen to have an Intel D975XBX2 motherboard (in retrospect an awful choice, but already installed) which includes a Marvell 88SE61xx SATA controller. This controller does not get along well with Linux. Again with some online research, the fix is just a few commands:
vi /etc/modprobe.d/blacklist # add at the end: blacklist pata_marvell vi /etc/initramfs-tools/modules # add at the end: sata_mv # then regen the initrd: update-initramfs -u
This works. But if I had it to do over, I’d rip out and throw away that motherboard, and replace it with any of the 99% of other motherboards that work well with Linux out of the box, or disable the Marvell controller and add another extra SATA controller on a card.
Is this as big as it gets?
Not by a long shot; this is a secondary, backup storage machine, worth a blog post because of the technical details. It has barely over $1000 worth of hard drives, and a total cost of under $3000 (even less for me, since I reused some old hardware). You can readily order off-the-shelf machines with much more storage (12, 16, 24, even 48 drives). The pricing per-byte is appealing up to the 16 or 24-drive level, then escalates to the stratosphere.
It would be interesting to compare my numbers above, to the hardware costs in here:
http://perspectives.mvdirona.com/2008/12/22/TheCostOfBulkColdStorage.aspx
Of course, there are non-hardware costs discussed there also.