database – Kyle Cordes

Unrealistic Cost Expectations, and How to Fix Them

I suppose there have been hiring companies with wildly unrealistic cost expectations forever; the internet just makes it more visible. Take, for example, this job post for PostgreSQL expert, which I republish here for criticism and comment, anonymized:

We are looking for a postgre expert with indepth Oracle skill to help with the following project:

1) migrate all data and structure from Oracle 9i to PostgreSQL 8.3.
2) create a script to capture daily differentials on Oracle db and export the changes to PostgreSQL
3) create a script to automate the import the Oracle differential export into PostgreSQL on a daily basis
4) full documentation

Will provide both Oracle and Postgre dev box to work with, interested party please send email to (REDACTED)

Job budget between USD 300 to USD 400. However need this delivered within one week from job acceptance, or before Dec. 31, 2009, whichever come first.

To clarify for anyone reading this, this is not my job post. I am not looking for a PG expert. Do not contact me to apply for this work.

This fellow wants:

An experienced guru
In two quite complex technologies, one of which is a very expensive technology
To do a non-trivial project, and presumably, to be responsible for the results actually working
Who can do their project Right Now
Over the Christmas holiday, at least here in the US
For a $400

It seems to me that this person, in addition to creating some annoyance on the mailing list where they posted it, simply has wildly unrealistic expectations. As a result, they are likely to end up disappointed with any real person applying for their work. They will quite likely get multiple applicants, eager to attack the job for the budget shown; so I am not suggesting that noone will do it.

Instead, I estimate that most likely a week will come and go, $400 will come and go, and they will not have a working system. With some struggle and legwork on the hiring end, they may get the end result for a surprisingly small multiple of the proposed budget… but if they started with a more realistic amount in the first place, they’d likely get there faster and with less work on the hiring end.

A broader lesson, that I’ve learned through experience in the trenches, is that if you don’t have a good feel for the price range, start with no price range. Then talk with the first handful of applicants, listening carefully. With a couple of hours (for a simple request), you’ll probably have at least some realistic sense of the size of your project. With this knowledge, you can make more realistic and credible job posts, yielding more and better applicants.

Were you hoping for an approach to fix someone else’s unrealistic expectations? Sorry, I’ve not found a good way to do this. The best you can do is to find and fix your own.

DRBD on Ubuntu 8.04

A while back I wrote about setting up DRBD on Ubuntu.

It is considerably simpler now: you don’t need to build the module anymore, a reasonably recent version is already in the box. You should still read the full directions at the DRBD web site, but to get it going, you only need to configure it and start it; don’t download anything, don’t compile any modules kernels, etc.

Rather, the module is already there; you could load it manually like this:

modprobe drbd

But you really want the utils package:

apt-get install drbd8-utils

… which sets things up the rest of the way. Simply proceed with the configuration file, /etc/drbd.conf. If you want a nice clean config file, remove all the existing resource sections in the drbd.conf installed by the Ub package, and put in something like so:

resource kylesdrbd {
  protocol      C;

  startup { wfc-timeout 60; degr-wfc-timeout  120; }
  disk { on-io-error detach; }
  syncer {
  }
  on x1.kylecordes.com {
    device      /dev/drbd4;
    disk        /dev/sdb5;
    address     192.168.17.61:7791;
    meta-disk   /dev/sdb1[0];
  }
  on x2.kylecordes.com {
    device      /dev/drbd4;
    disk        /dev/sda5;
    address     192.168.17.62:7791;
    meta-disk   /dev/sda3[0];
  }
}

The hostnames here need to match the full, actual hostnames. I show my example disk configuration, you’ll need to do somethign locally appropriate, based on understanding the DRBD docs.

Also adjust the syncer section:

syncer { rate 40M; # the rate is in BYTES per second }

If there was already a filesystem on the metadata partitions you’re trying to put under DRBD, you may need to clear it out:

dd if=/dev/zero of=/dev/sdb1 bs=1M count=1

Now you’re ready to fire it up; on both machines:

drbdadm create-md mwm drbdadm attach mwm drbdadm connect mwm

# see status:
cat /proc/drbd

You now are ready to make one primary; do this on only one machine of course:

drbdadm -- --overwrite-data-of-peer primary all

drbdadm -- --overwrite-data-of-peer primary kylesdrbd

In my case, I now see a resync starting in the /proc/drbd output.

You don’t need to wait for that to finish; go ahead and create a filesystem on /dev/drbd0.

It’s best to have a dedicated Gigabit-ethernet connection between the nodes; or for busy systems, a pair of bonded GigE. For testing though, running over your existing network is fine.

I found that an adjustment in the inittimeout setting helps to avoid long boot delays, if one of the two systems is down.

Of course I only covered DRBD here; typically in production you’d use it in conjuction with a failover/heartbeat mechanism, so that whatever resource you serve on the DRBD volume (database, NFS, etc.) cuts over to the other machine without intervention; there is plenty to read online about Linux high availability.

Analyzing PostgreSQL logs with pgFouine (on Ubuntu 8.04)

pgFouine is a slick, useful, and free tool for analyzing PostgreSQL query workloads. It works without any impact on the running PostgreSQL: it analyzes the PG log output. The caveat is that it needs PG configured to write the right kind of log output.

Sadly, as of version 8.3 PG has a wrinkle in how it writes its logs: multi-line queries can get jumbled together in the stderr-based log, resulting in erroneous output from pgFouine. Hopefully a future PG will be able to write its logs without this issue, but in the meantime, the answer it to use syslog logging instead of native PG logging. This isn’t a bad idea anyway, since syslogd and friends are well proven.

On our project where this need arose, we use the Ubuntu Linux distribution, currently version 8.04. Ubuntu’s PG package sets up native stderr logging; here are the steps needed to change that to syslog logging instead. These steps are about the same for other distributions (or for manual compiles), but with different paths.

The setting shown here for log_min_duration_statement will log all queries that take more than 4 seconds to complete. Depending on your server, workload, and type of workload (OLTP vs. OLAP), this might be too high or too low.

Edit your postgresql.conf file:

vi /etc/postgresql/8.3/main/postgresql.conf

log_destination = 'syslog'
log_line_prefix = 'user=%u,db=%d '
log_min_duration_statement = 4000
silent_mode = on
logging_collector = off

With PostgreSQL 8.2, set redirect_stderr instead of logging_collector:

redirect_stderr = off

Next, setup where syslog will store the data, and add “local0,none” to the ;-separated list of what goes in to var/log/message. On my system it ended up looking like this, but of course it may vary depending on what else you’ve set up in syslog:

vi /etc/syslog.conf

# add this:

local0.*        -/var/log/postgresql/main.log

# edit this:

    *.=info;*.=notice;*.=warn;\
    auth,authpriv.none;\
    local0.none;\
    cron,daemon.none;\
    mail,news.none          -/var/log/messages

Restart syslogd to make the change take effect:

/etc/init.d/sysklogd restart

Then restart PG so it starts logging there:

/etc/init.d/postgresql-8.3 restart

Note that we are putting these new logs in the existing /var/log/postgresql directory which the Ubuntu PG package creates; if you install PG manually, create such a directory yourself, or set up syslog to write to the pg_log directory. The existing logs there will remain, holding only the messages from PG startup and shutdown (via /etc/init.d/postgresql). I find this unhelpful but harmless.

Log Rotation

By putting the files in this preexisting location, we take advantage of the log rotation already set up in /etc/logrotate.d/postgresql-common. On a busy server, you may want to adjust the rotation setting therein from weekly to daily, or add a line with “size 1000k” or so. Take a look at “man logrotate” to learn about many useful options, such as the ability to have these logs emailed to your DBA as they rotate.

pgFouine

Finally, you are ready to analyze logs. If you plan to analyze them on the same machine where you run your database (probably not a great idea), proceed (on Ubuntu) to get the PHP command line executable:

apt-get install php5-cli

Then download the pgFouine tarball, quietly curse the lack of an Ubuntu package, put it in your $PATH, and run it. Don’t be alarmed by its .php file extension; PHP is a usable (though not particularly charming) language for writing command line tools, as well as dynamic web pages.

cd /var/log/postgresql

pgfouine.php -file main.log  >somefile.html

View the HTML file in your web browser, and dig in to your worse queries. Good luck.

Multicast your DB backups with UDPCast

At work we have a set of database machines set up so that one is the primary machine, making backups once per day, and several other machines restore this backup once per day, for development, ad hoc reporting, and other secondary purposes. We started out with an obvious approach:

back up on server1, to a file on server1
SCP or rsync the file from server1 to server2
restore the DB on server2

… but over time as the data has grown the inefficiency of such an approach become equally obvious: the backup data goes back and forth across the network and to/from disk repeatedly. These steps only count the backup data, not the live storage in the DBMS:

On to the disk on server1 (putting additional load on the primary DB machine)
Off the disk on server1 (putting additional load on the primary DB machine)
On to the disk on server2
Off the disk on server2

This is also wasteful from a failure-recovery point of view, since the place we are least likely to need the backup is on the machine whose failure would lead us to need a backup.

Pipe it over the network instead

The project at hand uses PostgreSQL on Linux, so I’ll show example PG commands here. The principles apply equally well to other DBs and platforms of course, though some DBMSs or platforms might not offer backup and restore commands that stream data. (I’m looking at you, MS SQL Server!)

What we need is a pipe that goes over the network. One way to get such a pipe is with ssh (or rsh), something like so, run from server1:

pg_dump -Fc dbnameonserver1 | ssh server2 pg_restore -Fc -v -O -x -d dbnameonserver2

This variation will simultaneously store the backup in a file on server1:

pg_dump -Fc dbnameonserver1 | tee dbname.dump | ssh server2 pg_restore -Fc -v -O -x -d dbnameonserver2

This variation (or something close, I last run this several days ago) will store the backup in a file on server2 instead:

pg_dump -Fc dbnameonserver1 | ssh server2 "tee dbname.dump | pg_restore -Fc -v -O -x -d dbnameonserver2"

To reduce the CPU load from this, adjust SSH to use less CPU-intensive encryption, or avoid that entirely with rsh (but only if you have a trusted / local network).

Multicast / Broadcast it over the network instead

The above commands are good for point-to-point streaming backup / restore, but the scenario I have in mind has one primary machine and several (3, 4, or more) secondary machines. One answer is to run the above process repeatly, once for each secondary machine, but that sends the whole backup over the network N times. Inefficiency! (==Blashphemy?)

To avoid that, simply use UDPCast. It’s a trivial install on Debian / Ubuntu:

apt-get install udpcast

(Be warned though: there is at least one annoying bug in the old (2004) UDPCast offered off-the-shelf in Debian / Ubuntu as of 2008. You might need to the latest UDPCast source from its web site above.)

Run this on the server1:

pg_dump -Fc dbnameonserver1 | udp-sender --min-wait 5 --nokbd

Run this on the server2 .. serverN:

udp-receiver --nokbd | pg_restore -Fc -v -O -x -d dbnameonserverN

With this approach, the backup data will be multicast (or broadcast, if multicast does not work and if all the machines are on the same segment), only traversing the network once no matter how many receiving machines are set up. udp-receiver has a –pipe option, but I found that I occasionally get corruption with huge (50GB+) transfer, when using –file or –pipe. So I recommend this, to save a copy on the receiving end:

udp-receiver --nokbd | tee mydatabase.dump | pg_restore -Fc -v -O -x -d dbnameonserverN

Or perhaps you want to just receive and store the backup on a file server, with this:

udp-receiver --nokbd >mydatabase.dump

To make all this happen automatically, you’ll set the sender to start at the same time as the receivers in “cron” on the relevant machines. Use NTP to keep their clocks in sync, and adjust the udp-sender and udp-receiver options as needed to get the whole process to start smoothly in spite of minor timing variations (–min-wait t, –max-wait t).

As with the previous suggestion for rsh, the data will travel unencrypted over your network, so do this only if you trust your network (such as a LAN segment between your database servers).

Multicast / broadcast is very useful technology, and with UDPcast it is quite easy to use. UDPcast also implements a checksum/retransmit mechanism, it is not a “bare”, loss-prone UDP transmission.