Optimize Hierarchy Queries with a Transitive Closure Table

Last year I posted about the use of a Joe Celko-style nested set hierarchy representation, for fast hierarchy queries. Here I will describe another approach which is simpler to query, but more wasteful of space. I did not invent this transitive closure approach, I learned of it from several directions:

There are two (main) places to put the code for building a closure tables: in your application code, or in your database. The application approach is nice if you are aiming to avoid vendor-specific SQL code, but it is quite simple in SQL, and therefore not a big problem to recode for another RDBMS if the need arises. The SQL approach also avoids round-tripping the relevant data in and out of the database. Therefore, the approach I generally recommend for this is an SQL stored procedure / function.

Here is a simplified PostgreSQL stored procedure to do the job; note that his assumes a “widget” table with a widget_id and parent_id (the “adjacency” representation of a hierarchy), and a widget_closure table with fields (ancestor_id, widget_id, depth). Wordpress has unhelpfully discarded my indentation:

CREATE OR REPLACE FUNCTION populate_widget_closure()
RETURNS integer AS ‘
DECLARE
distance int;
BEGIN
delete from widget_closure;

insert into widget_closure
select widget_id, widget_id, 0 from widget;

FOR distance in 1..20 LOOP
insert into widget_closure
select uc.parent_id, u.widget_id, distance
from widget_closure uc, widget u
where uc.child_id=u.reports_to_id
and uc.distance = distance-1;
END LOOP;
RETURN 1;
END;
‘ LANGUAGE plpgsql;

This sample code assumes a maximum depth of 20, and has no error checking. It will blindly miss greater depths and produce garbage if there is a “loop” in the ancestry. I recommend both arbitrary depth handling and error checking for production use.

Once your transitive closure table is populated, querying it is extremely fast and simple. You can get all the descendants of widget 12345 with the query “select widget_id from widget_closure where ancestor_id=12345″. Of course, this hierarchy representation, while simple to generate, is not simple to incrementally update as the hierarchy changes. The most straightforward way to use it is as a cache, regenerated as needed.

Digg |  Reddit

About the author

2 Responses to “Optimize Hierarchy Queries with a Transitive Closure Table”

  1. Alex Miller Says:

    Nice. Yet another episode in the long history of trading space for time… :)

  2. Craig Buchek Says:

    Thanks for sharing this. I read your article on Celko’s method, and followed up by reading the original articles by Celko. Seems like a good method in some situations, but Celko’s method would seem to have poor insert/update performance as well, due to having to update, on average, half of the records in the table.

    I’d be interested in reading about more techniques like these.

Leave a Reply