Enterpriseyness

Q: How do you know when you’re talking to an excessively enterprisey software vendor?

A: When they require an NDA before they tell you anything about what their product costs.

I’ve been down the enterprise software path; I’ve worked in companies producing it, I’ve worked in companies buying it. I know the drill. I’ve fought the battles to produce quality, polished software in spite of the forces leading toward (ummm….) overly postmodern results.

Be proud of your offerings and your value proposition. Of course we all have situation where a lot of discussions are needed to figure out what a project will cost; but if you have an off the shelf product, it’s just basic block-and-tackle execution to get to a point where you can state a few prices for a few packages as part of your sales presentation.

Incompetence -> Progress

From http://www.theodoregray.com/BrainRot/index.html

“The most profound engine of civilization is the inability of a larger and larger fraction of the population to do the basic things needed to survive.  Many people fail to realize this.”

“Technology’s greatest contribution is to permit people to be incompetent at a larger and larger range of things.  Only by embracing such incompetence is the human race able to progress.”

It’s a good read, with important thoughts on civilization and education.  It is part of the introduction to a book about Mathematica, a truly amazing piece of software. (Mathematica, was amazing when I used it in the 1990s, I can scarcely imagine what it does now.)

DRBD on Ubuntu 8.04

A while back I wrote about setting up DRBD on Ubuntu.

It is considerably simpler now: you don’t need to build the module anymore, a reasonably recent version is already in the box. You should still read the full directions at the DRBD web site, but to get it going, you only need to configure it and start it; don’t download anything, don’t compile any modules kernels, etc.

Rather, the module is already there; you could load it manually like this:

modprobe drbd

But you really want the utils package:

apt-get install drbd8-utils

… which sets things up the rest of the way. Simply proceed with the configuration file, /etc/drbd.conf. If you want a nice clean config file, remove all the existing resource sections in the drbd.conf installed by the Ub package, and put in something like so:

resource kylesdrbd {
  protocol      C;

  startup { wfc-timeout 60; degr-wfc-timeout  120; }
  disk { on-io-error detach; }
  syncer {
  }
  on x1.kylecordes.com {
    device      /dev/drbd4;
    disk        /dev/sdb5;
    address     192.168.17.61:7791;
    meta-disk   /dev/sdb1[0];
  }
  on x2.kylecordes.com {
    device      /dev/drbd4;
    disk        /dev/sda5;
    address     192.168.17.62:7791;
    meta-disk   /dev/sda3[0];
  }
}

The hostnames here need to match the full, actual hostnames. I show my example disk configuration, you’ll need to do somethign locally appropriate, based on understanding the DRBD docs.

Also adjust the syncer section:

syncer {
rate 40M;  # the rate is in BYTES per second
}

If there was already a filesystem on the metadata partitions you’re trying to put under DRBD, you may need to clear it out:

dd if=/dev/zero of=/dev/sdb1 bs=1M count=1

Now you’re ready to fire it up; on both machines:

drbdadm create-md mwm
drbdadm attach mwm
drbdadm connect mwm

# see status:
cat /proc/drbd

You now are ready to make one primary; do this on only one machine of course:

drbdadm -- --overwrite-data-of-peer primary all

or

drbdadm -- --overwrite-data-of-peer primary kylesdrbd

In my case, I now see a resync starting in the /proc/drbd output.

You don’t need to wait for that to finish; go ahead and create a filesystem on /dev/drbd0.

It’s best to have a dedicated Gigabit-ethernet connection between the nodes; or for busy systems, a pair of bonded GigE. For testing though, running over your existing network is fine.

I found that an adjustment in the inittimeout setting helps to avoid long boot delays, if one of the two systems is down.

Of course I only covered DRBD here; typically in production you’d use it in conjuction with a failover/heartbeat mechanism, so that whatever resource you serve on the DRBD volume (database, NFS, etc.) cuts over to the other machine without intervention; there is plenty to read online about Linux high availability.

Move files into an existing directory structure, on Linux

I recently needed to move a large number of files (millions) in a deep directory structure, into another similar directory structure, “merging” the contents of each directory and creating any missing directories. This task is easily (though slowly) performed on Windows with Control-C Control-V in Explorer, but I could find no obvious way to do it on Linux.

There is quite a bit of discussion about this on the web, including:

  • Suggestions to do with with tar; this is a poor idea because it copies all the file data, taking an enormously long time.
  • Suggestions to do it with “mv -r”… but as far as I can tell, mv does not have a -r option.

After a little thought I came up with the script below. I’d love to have a Linux/Bash guru out there point out how awful it and and send me something better!

A critical feature for me is that it does not overwrite files; if a source file name/path overlaps a destinate file, the source file is left alone, untouched. This can be changed easily to overwrite instead: remove the [[-f]] test.

$ cat ~/bin/move_files_merge.sh
#!/bin/bash

# Move files from dir $1 to dir $2,
# merging in to existing dirs
# Call it like so:
# move_files_merge.sh /FROM/directory /TO/directory

# Lousy error handling:
# Exit if called with missing params.
[ "A" == "A${1}" ] && exit 1
[ "A" == "A${2}" ] && exit 1

echo finding all the source directories
cd $1
find . -type d -not -empty | sort | uniq >$2/dirlist.txt

echo making all the destination directories
cd $2
wc -l dirlist.txt
xargs --no-run-if-empty -a dirlist.txt mkdir -p
rm dirlist.txt

echo Moving the files
cd $1
# There is surely a better way to do this:
find . -type f -printf "[[ -f '$2/%p' ]] || mv '%p' '$2/%h'\n" | bash

echo removing empty source dirs
find $1 -depth -type d -empty -delete

echo done.

Large, economical RAID: 10 1 TB drives

I recently needed a file server with ample capacity, on which to store data backups. The typical data access pattern for backups is that data is written once and read rarely, so bulk backup storage has much lower performance requirements than, for example, disks used for database files.

I need to store a great number of files, and I had an old server to recycle, so I ended up with:

  • 4U ASUS case with room for many internal drives
  • Qty1, leftover 320GB drive (to boot from)
  • Qty 10, 1 TB drives for data storage: WD Caviar Green
  • An extra SATA controller (plus the 8 SATA ports on the motherboard)
  • Ubuntu Linux 8.04.1 LTS
  • Software RAID6

The separate boot drive is for simplicity; it contains a trivial, vanilla Ubuntu install; if availability mattered more I could replace it with a RAID1 pair, or flash storage – even a cheap USB “key drive” would be sufficient, if I went to the trouble of setting up /var and /tmp to not write to it (thus avoid premature wearout).

The terabyte drives have one large RAID container partition each (quick work with sfdisk). The 10 of them in a RAID6 yield 8 drives worth of capacity. Adjusting also for the difference between marketing TB/GB and the real thing, plus a bit of filesystem overhead, I ended up with 7.3 TB of available storage. Here it is, with some data already loaded:

Filesystem            Size  Used Avail Use% Mounted on
/dev/sde1             285G  1.3G  269G   1% /
varrun                2.0G  164K  2.0G   1% /var/run
varlock               2.0G     0  2.0G   0% /var/lock
udev                  2.0G  112K  2.0G   1% /dev
devshm                2.0G     0  2.0G   0% /dev/shm
/dev/md0              7.3T  3.1T  4.3T  42% /raid

I went with software RAID for simplicity, low cost, and easy management:

# cat /proc/mdstat
[....]
md0 : active raid6 sda1[0] sdk1[9] sdj1[8] sdi1[7] sdh1[6] sdg1[5] sdf1[4] sdd1[3] sdc1[2] sdb1[1]
      7814079488 blocks level 6, 64k chunk, algorithm 2 [10/10] [UUUUUUUUUU]

I chose RAID6 over RAID5 because:

  • This array is so large, and a rebuild takes so long, that the risk of a second drive failing before a first failure is replaced and rebuilt seems high.
  • 8 drives of capacity for 10 drives is a decent value. 5/10 (with RAID10) is not.

It turns out that certain default settings in Linux software RAID can yield instability under RAM pressure in very large arrays; after some online research I made the adjustments below and it appears solid. The sync_speed_max setting throttles back the RAID rebuild, helpful because I was able to start populating the storage during the very long rebuild process.

vi /etc/rc.local
echo 30000 >/proc/sys/vm/min_free_kbytes
echo 8192 >/sys/block/md0/md/stripe_cache_size
echo 10000 >/sys/block/md0/md/sync_speed_max

vi /etc/sysctl.conf
vm.vfs_cache_pressure=200
vm.dirty_expire_centisecs = 1000
vm.dirty_writeback_centisecs = 100

7.3T is far beyond the size limit for ext2/3, so I went with XFS. XFS appears to deal well with the large size without any particular tuning, but increasing the read-ahead helps with my particular access pattern (mostly sequential), also in rc.local:

blockdev --setra 8192 /dev/sda
blockdev --setra 8192 /dev/sdb
blockdev --setra 8192 /dev/sdc
blockdev --setra 8192 /dev/sdd
blockdev --setra 8192 /dev/sde
blockdev --setra 8192 /dev/sdf
blockdev --setra 8192 /dev/sdg
blockdev --setra 8192 /dev/sdh
blockdev --setra 8192 /dev/sdi
blockdev --setra 8192 /dev/sdj
blockdev --setra 8192 /dev/sdk

I was happy to find that XFS has a good “defrag” capability; simply install the xfsdump toolset (apt-get install xfsdump) then schedule xfs_fsr to run daily in cron.

Power consumption seems reasonable at 3.0 amps under load.

Marvell: Not Marvellous

In this machine I happen to have an Intel D975XBX2 motherboard (in retrospect an awful choice, but already installed) which includes a Marvell 88SE61xx SATA controller. This controller does not get along well with Linux. Again with some online research, the fix is just a few commands:

vi /etc/modprobe.d/blacklist
# add at the end:
blacklist pata_marvell

vi /etc/initramfs-tools/modules
# add at the end:
sata_mv

# then regen the initrd:
update-initramfs -u

This works. But if I had it to do over, I’d rip out and throw away that motherboard, and replace it with any of the 99% of other motherboards that work well with Linux out of the box, or disable the Marvell controller and add another extra SATA controller on a card.

Is this as big as it gets?

Not by a long shot; this is a secondary, backup storage machine, worth a blog post because of the technical details. It has barely over $1000 worth of hard drives, and a total cost of under $3000 (even less for me, since I reused some old hardware). You can readily order off-the-shelf machines with much more storage (12, 16, 24, even 48 drives). The pricing per-byte is appealing up to the 16 or 24-drive level, then escalates to the stratosphere.