Optimize Hierarchy Queries with a Transitive Closure Table

Last year I posted about the use of a Joe Celko-style nested set hierarchy representation, for fast hierarchy queries. Here I will describe another approach which is simpler to query, but more wasteful of space. I did not invent this transitive closure approach, I learned of it from several directions:

There are two (main) places to put the code for building a closure tables: in your application code, or in your database. The application approach is nice if you are aiming to avoid vendor-specific SQL code, but it is quite simple in SQL, and therefore not a big problem to recode for another RDBMS if the need arises. The SQL approach also avoids round-tripping the relevant data in and out of the database. Therefore, the approach I generally recommend for this is an SQL stored procedure / function.

Here is a simplified PostgreSQL stored procedure to do the job; note that his assumes a “widget” table with a widget_id and parent_id (the “adjacency” representation of a hierarchy), and a widget_closure table with fields (ancestor_id, widget_id, depth):

CREATE OR REPLACE FUNCTION populate_widget_closure()
RETURNS integer AS '
DECLARE
  distance int;
BEGIN
  delete from widget_closure;
  insert into widget_closure
    select widget_id, widget_id, 0 from widget;

  FOR distance in 1..20 LOOP
    insert into widget_closure
    select uc.parent_id, u.widget_id, distance
      from widget_closure uc, widget u
      where uc.child_id=u.reports_to_id
        and uc.distance = distance-1;
  END LOOP;
  RETURN 1;
END;
' LANGUAGE plpgsql;

This sample code assumes a maximum depth of 20, and has no error checking. It will blindly miss greater depths and produce garbage if there is a “loop” in the ancestry. I recommend both arbitrary depth handling and error checking for production use.

Once your transitive closure table is populated, querying it is extremely fast and simple. You can get all the descendants of widget 12345 with the query “select widget_id from widget_closure where ancestor_id=12345”. Of course, this hierarchy representation, while simple to generate, is not simple to incrementally update as the hierarchy changes. The most straightforward way to use it is as a cache, regenerated as needed.

SQL Server Log Shipping – Testing the impact of large operations

Background

In a project we have here at Oasis Digital, our customer relies on the log shipping feature of Microsoft SQL Server 2000 to keep some secondary databases, reporting databases, up to date in almost-real-time: every few minutes a transaction log backup runs, then at a slightly longer interval, every 5-15 minutes, a restore of those logs runs on secondary reporting databases.  We’ve developed some code so that the reports can automatically run against any available reporting database; thus as the reporting database occasionally go offline for log shipping restores, the end users never notice that there’s more than one different database being used for their reports.

Most of the time, this works very well. The users get reports quickly, and those reports (no matter how large and painful the queries are) never have any impact on the production OLTP databas. But we have been stung a few times by changes we make in the production database having an unexpectedly large impact on that log shipping process.

We have two theories on what happens:

  1. Certain kinds of changes in the production database, such as adding a field, changing a field type, changing an index, can affect a very large number of pages of the database – since SQL Server log shipping occurs a page at a time, an operation that makes a small change to 100,000 pages of the database can result in an extremely large file being shipped between the production database and the reporting databases.
  2. There are performance characteristics of that restore process which we don’t fully understand – sometimes a log restore takes considerably longer than its side would suggest.  Sometimes a log-shipped change triggers such an occurrence.

The result of those two issues together is that sometimes we intentionally make a seemingly minor change in the database and it has a large, negative impact on that log shipping process: hours of reporting database downtime!  Occasionally it’s taken more than a day for the log restore target DBs to catch up.

This affects the users severely, and is therefore a Very Bad Thing.

Therefore, we’ve been looking for a way to assess the impact of such schema and index changes, on the log ship process, before making them in production.

Testing / Measuring

Unfortunately the only reasonable answer appears to be the rather large hammer of recreate a copy of the production system, including log shipping.  The databases in question are quite large, so this means two servers, or one server with a bunch of hard drives (to run the log ship source and destination on the same machine).

The process is:

  • Install a lot of hard drive space, OS, SQL Server
  • Restore a copy of the production DB, call this “test1”
  • Set up log shipping to one or two reporting DBs, test2 and test3

Then for each test run:

  • Run log shipping (so its all up to date)
  • Run test queries (like adding a field, changing a field type, changing a large index, etc)
  • Run log shipping again, and observe how big of a file is log shipped between the two DBs.

The main metric we’ll get out of this is for change X (i.e. for a certain SQL DDL change or DML change), we get a log ship of size N megabytes or N gigabytes.  I suspect that with this kind of data in hand, we will soon discover the underlying “rules” and understand which changes result in very large logs, so most of the time we can tell which things are going to have a big impact, and schedule them around appropriately.

We can even automate the test process: feed in a piece of SQL to a program runs those steps.  It might take hours of (cheap) computer time to run, but little (expensive) human time.

The Plot Thickens

What I’ve described above is the simple case.  There’s a more complex case: we’ve observed that the worst delays tend to happen if a log backup occurs in the middle of a certain operation – even operations that are usually harmless, can perhaps result in a huge log ship if the log backup happens mid-operation.  These also appear to be the logs that take especially long to restore on the recording on the destination database.

To test this, we can take a piece of DML that we think is going to take several minutes to run, start it, wait 30 seconds or so, and while it’s still running start a log backup.  Then wait until the DML completes and start another log backup.  We would gain several data points: the size of the mid-operation ship and the final data ship, the how long each takes to restore.  I suspect we will learn that it’s a really bad idea to let a log backup start while running any potentially large operation.

To prevent that, perhaps we can automate these operations like so:

  • Run a snippet of SQL to disable log backups (log shipping)
  • Wait for any running backup to finish
  • Run the target SQL
  • Wait for it to finish
  • Reenable log backups (log shipping)

That’s as much detail as I have time to post; hopefully this will help someout out there with SQL Server log shipping problems.  It would be great to hear from others out there who have experienced similar log shipping issues.

SQL Server Log Shipping Fragility

In a customer production system, we use MS SQL Server 2000 “log shipping” extensively to offload reporting DB load to secondary servers… a common use of the technology. This generally works very well, most of the time.
We’ve noticed, though, that it can be somewhat fragile, in that it is easy to inadventantly run a SQL operation (such as a large index update, schema change, etc.) that results in many, many pages shipped, and hence in large log backup files, and large “standby” files – all of which are processed surprisingly slowly by SQL Server. The result of this (the fragility) is that the log ship destination servers can then take many, many hours to return to an operatonal state.
It turns out that one cause of this is that log backup (and hence log shipping) apparently happens at the “page” level: if one bit on a page changes, all 8K of the page is shipped.

A second cause is that if a log backup happens during the middle of a large operation, it will ship the partially complete operation; the destination servers will then generate huge “standby” files (we’ve seen them over a gigabyte) in the restore process.
One path forward is to to move up to SQL Server 2005, which has numerous log shipping improvements. Imagine my dismay to find this today:

http://sqljunkies.com/WebLog/acalvett/archive/2006/03/12/18993.aspx

in which Andrew Calvett found that the log ship monitoring functionality is much worse in SQL Server 2005 than in SQL Server 2000.

As usual there is likely a third party tool to make this better. But since log shipping is such a powerful and flexible tool for scaling up SQL Server based applications, it is disappointing to see the monitoring get worse rather than better. Perhaps in SQL Server “2010”…