Long Rails Stack Trace

I was looking at the RailsConf schedule, and saw the Rails web app that runs it fail with this longish stack trace. A few Rubyists sitting nearby commented that the application was not well configured, that usually Rails apps are configured in deployment to store such traces to a log, rather than spit them at the user. Their mistake is my gain though, as it provided fodder for this post.

This stack trace is considerably shorter than a typical Java web app stack trace. But it is not so good either, and it is an ominous sign that excessive complexity is making its way in to Rails.

How To Do Deployment (Dave Thomas RailsConf Keynote)

I just attended Dave Thomas’s keynote at RailsConf. He had many interesting things to say, most notably that 60%+ of his Java-centric conference-circuit friends, mostly people who have written books on Java, speak regularly on Java, and have lots of experience as Java “architects”, are now making a living with Ruby / Rails.

Dave talked about 3 key areas where Rails should improve. The one I want to focus on is deployment. Rails has Capistrano, a slick tools for deployment automation. Capistrano is great, but it misses the boat in one key way: it is push-based. It assumes a world where the developer sets up, controls, and performs deployment to production servers. That is of course the case at the start of a small operation, but it doesn’t scale; in large organization, or even in small growing ones, sooner or later there is staff dedicated to production deployment and monitoring, separate from development.

One way, to me right way, to handle deployment is push / pull:

For the “push”, a developer uploads a new version to a storage site somewhere. It travels in the form of an archive file (ZIP, tgz, whatever) which contains all of the needed artifacts, as well as metadata about the required libraries, pre- and post-install steps, etc. for the new version. It has a unique name; it lands at a URL. Normal, off the shelf technology (FTP, web servers, etc.) can be used to serve and secure this storage, implement policies about who can upload new versions, etc. This upload happens with some simple one-line command (or as part of an auto-build); it does not actually put a new version in production, but only makes it available for pulling.

Then, on a production machine, someone with software installation rights on that machine (administrated by the normal tools – not by any special feature of the deployment mechanism) runs a command (something like “deploy staging http://ourserver/project1/project1-build-3453.tgz”) which downloads the new build, runs various sanity checks, deploys it to a staging area, and makes it available to try out. Once the staging is verified, a similar command brings it in to live production. These builds refer as needed to configuration data on the deployment machines (such as production DB access credentials); the build archives are generic, not specific to any particular production machine.

The main idea behind this is that each person has the rights they need for the work they need to do, and these rights don’t need any special help or support from the deployment mechanism. Developers don’t need any special rights on deployment machines, nor to deployers need any special rights on development machine

I wrote this in the context of a Rails / Ruby talk, but it’s at least as relevant in Java and Delphi world; in fact at Oasis Digital we need something like this on a Java project and a Delphi project right now.

At RailsConf, Sitting Around Ignoring Each Other

I’m at RailsConf. In a flagrant attempt to be one of the cool kids, I’ve set up a Flickr account to share my RailsConf Photos.

Of the few dozen conference I’ve attented, this one has the highest incidence so far of people sitting around, in large groups, working alone on their notebook computers; some of the groups are having conversations, but most of the attendees (including me, as I type this) are staring intently at their own PCs. It makes me wonder why we are all here – are we all so accustomed to interacting online that we don’t bother to interact offline, even when sitting in the same room (and at considerable expense?)

Take control of Delphi forms in your multimonitor application

The authors of the the VCL helpfully added multi monitor “support” a few versions back. Somewhat less helpfully, this support is very limited – it has no way to say “create form X on monitor 3”, which is a very useful thing to do in some kinds of applications – those intended to run on a multimon system in a specific configuration.

You can readily move forms around (with .Left, .Top, etc.) on to specific monitors after a Form is open, but not while it is opening (in OnShow or OnCreate), because a fine method TCustomForm.SetWindowToMonitor will jump in and undo your work.

Not surprisingly, most of the obvious methods to override to fix this, are not virtual. (An aside – I think the bias toward final methods in Delphi, C#, and some other languages, is a poor idea, and I think the guideline of “design for overriding, or prevent it” is even worse.) With some looking, I found a workaround, shown here somewhat mangled by WordPress:

private

FSetBoundsEnabled: boolean;

public

procedure SetBounds(ALeft, ATop, AWidth, AHeight: Integer); override;

…..

procedure TSomeForm.SetBounds(ALeft, ATop, AWidth, AHeight: Integer);

begin if FSetBoundsEnabled then inherited;

end;

FSetBoundsEnabled will be False by default. Sometime after your form is loaded (use a Timer or a windows message to signal this), set FSetBoundsEnabled := True then set your .Left and .Top to put your form whereever it needs to be.

Pure Ruby Sparklines – No RMagick ImageMagick

This Pure Ruby Sparklines implementation got my attention – unlike other Ruby graphics packages, it does not need RMagick, and thus not ImageMagick, sparing the installer considerable effort and misery. The source of that misery is the long list of dependencies; now of course a great positive of the open source world is that tools can readily build on each other – but this comes with a cost of unexpected complexity in getting things to compile and install. We had a project here delayed by a couple of weeks of working through issues in getting it all to work on shared hosting account. I never did get it working with Ruby 1.8.4 on Windows, I reverted back to 1.8.2 for now.

With a “Pure Ruby” implementation, you copy the .rb files over and it works. I wonder if “Pure Ruby” will become a marketing point like “Pure Java”.

Lest anyone accuse me of unwarranted Ruby fanhood, I’ll point this out: with Ruby it is necessary to find or what code for graphic drawing. With Java, it is in the box, in the form of the excellent Java2D feature set. Ruby is way behind in the area.

SQL Server Log Shipping – Testing the impact of large operations

Background

In a project we have here at Oasis Digital, our customer relies on the log shipping feature of Microsoft SQL Server 2000 to keep some secondary databases, reporting databases, up to date in almost-real-time: every few minutes a transaction log backup runs, then at a slightly longer interval, every 5-15 minutes, a restore of those logs runs on secondary reporting databases.  We’ve developed some code so that the reports can automatically run against any available reporting database; thus as the reporting database occasionally go offline for log shipping restores, the end users never notice that there’s more than one different database being used for their reports.

Most of the time, this works very well. The users get reports quickly, and those reports (no matter how large and painful the queries are) never have any impact on the production OLTP databas. But we have been stung a few times by changes we make in the production database having an unexpectedly large impact on that log shipping process.

We have two theories on what happens:

  1. Certain kinds of changes in the production database, such as adding a field, changing a field type, changing an index, can affect a very large number of pages of the database – since SQL Server log shipping occurs a page at a time, an operation that makes a small change to 100,000 pages of the database can result in an extremely large file being shipped between the production database and the reporting databases.
  2. There are performance characteristics of that restore process which we don’t fully understand – sometimes a log restore takes considerably longer than its side would suggest.  Sometimes a log-shipped change triggers such an occurrence.

The result of those two issues together is that sometimes we intentionally make a seemingly minor change in the database and it has a large, negative impact on that log shipping process: hours of reporting database downtime!  Occasionally it’s taken more than a day for the log restore target DBs to catch up.

This affects the users severely, and is therefore a Very Bad Thing.

Therefore, we’ve been looking for a way to assess the impact of such schema and index changes, on the log ship process, before making them in production.

Testing / Measuring

Unfortunately the only reasonable answer appears to be the rather large hammer of recreate a copy of the production system, including log shipping.  The databases in question are quite large, so this means two servers, or one server with a bunch of hard drives (to run the log ship source and destination on the same machine).

The process is:

  • Install a lot of hard drive space, OS, SQL Server
  • Restore a copy of the production DB, call this “test1”
  • Set up log shipping to one or two reporting DBs, test2 and test3

Then for each test run:

  • Run log shipping (so its all up to date)
  • Run test queries (like adding a field, changing a field type, changing a large index, etc)
  • Run log shipping again, and observe how big of a file is log shipped between the two DBs.

The main metric we’ll get out of this is for change X (i.e. for a certain SQL DDL change or DML change), we get a log ship of size N megabytes or N gigabytes.  I suspect that with this kind of data in hand, we will soon discover the underlying “rules” and understand which changes result in very large logs, so most of the time we can tell which things are going to have a big impact, and schedule them around appropriately.

We can even automate the test process: feed in a piece of SQL to a program runs those steps.  It might take hours of (cheap) computer time to run, but little (expensive) human time.

The Plot Thickens

What I’ve described above is the simple case.  There’s a more complex case: we’ve observed that the worst delays tend to happen if a log backup occurs in the middle of a certain operation – even operations that are usually harmless, can perhaps result in a huge log ship if the log backup happens mid-operation.  These also appear to be the logs that take especially long to restore on the recording on the destination database.

To test this, we can take a piece of DML that we think is going to take several minutes to run, start it, wait 30 seconds or so, and while it’s still running start a log backup.  Then wait until the DML completes and start another log backup.  We would gain several data points: the size of the mid-operation ship and the final data ship, the how long each takes to restore.  I suspect we will learn that it’s a really bad idea to let a log backup start while running any potentially large operation.

To prevent that, perhaps we can automate these operations like so:

  • Run a snippet of SQL to disable log backups (log shipping)
  • Wait for any running backup to finish
  • Run the target SQL
  • Wait for it to finish
  • Reenable log backups (log shipping)

That’s as much detail as I have time to post; hopefully this will help someout out there with SQL Server log shipping problems.  It would be great to hear from others out there who have experienced similar log shipping issues.