SQL Server Log Shipping Fragility

In a customer production system, we use MS SQL Server 2000 “log shipping” extensively to offload reporting DB load to secondary servers… a common use of the technology. This generally works very well, most of the time.
We’ve noticed, though, that it can be somewhat fragile, in that it is easy to inadventantly run a SQL operation (such as a large index update, schema change, etc.) that results in many, many pages shipped, and hence in large log backup files, and large “standby” files – all of which are processed surprisingly slowly by SQL Server. The result of this (the fragility) is that the log ship destination servers can then take many, many hours to return to an operatonal state.
It turns out that one cause of this is that log backup (and hence log shipping) apparently happens at the “page” level: if one bit on a page changes, all 8K of the page is shipped.

A second cause is that if a log backup happens during the middle of a large operation, it will ship the partially complete operation; the destination servers will then generate huge “standby” files (we’ve seen them over a gigabyte) in the restore process.
One path forward is to to move up to SQL Server 2005, which has numerous log shipping improvements. Imagine my dismay to find this today:


in which Andrew Calvett found that the log ship monitoring functionality is much worse in SQL Server 2005 than in SQL Server 2000.

As usual there is likely a third party tool to make this better. But since log shipping is such a powerful and flexible tool for scaling up SQL Server based applications, it is disappointing to see the monitoring get worse rather than better. Perhaps in SQL Server “2010”…