Technology – Page 28

Pure Ruby Sparklines – No RMagick ImageMagick

This Pure Ruby Sparklines implementation got my attention – unlike other Ruby graphics packages, it does not need RMagick, and thus not ImageMagick, sparing the installer considerable effort and misery. The source of that misery is the long list of dependencies; now of course a great positive of the open source world is that tools can readily build on each other – but this comes with a cost of unexpected complexity in getting things to compile and install. We had a project here delayed by a couple of weeks of working through issues in getting it all to work on shared hosting account. I never did get it working with Ruby 1.8.4 on Windows, I reverted back to 1.8.2 for now.

With a “Pure Ruby” implementation, you copy the .rb files over and it works. I wonder if “Pure Ruby” will become a marketing point like “Pure Java”.

Lest anyone accuse me of unwarranted Ruby fanhood, I’ll point this out: with Ruby it is necessary to find or what code for graphic drawing. With Java, it is in the box, in the form of the excellent Java2D feature set. Ruby is way behind in the area.

SQL Server Log Shipping – Testing the impact of large operations

Background

In a project we have here at Oasis Digital, our customer relies on the log shipping feature of Microsoft SQL Server 2000 to keep some secondary databases, reporting databases, up to date in almost-real-time: every few minutes a transaction log backup runs, then at a slightly longer interval, every 5-15 minutes, a restore of those logs runs on secondary reporting databases. We’ve developed some code so that the reports can automatically run against any available reporting database; thus as the reporting database occasionally go offline for log shipping restores, the end users never notice that there’s more than one different database being used for their reports.

Most of the time, this works very well. The users get reports quickly, and those reports (no matter how large and painful the queries are) never have any impact on the production OLTP databas. But we have been stung a few times by changes we make in the production database having an unexpectedly large impact on that log shipping process.

We have two theories on what happens:

Certain kinds of changes in the production database, such as adding a field, changing a field type, changing an index, can affect a very large number of pages of the database – since SQL Server log shipping occurs a page at a time, an operation that makes a small change to 100,000 pages of the database can result in an extremely large file being shipped between the production database and the reporting databases.
There are performance characteristics of that restore process which we don’t fully understand – sometimes a log restore takes considerably longer than its side would suggest. Sometimes a log-shipped change triggers such an occurrence.

The result of those two issues together is that sometimes we intentionally make a seemingly minor change in the database and it has a large, negative impact on that log shipping process: hours of reporting database downtime! Occasionally it’s taken more than a day for the log restore target DBs to catch up.

This affects the users severely, and is therefore a Very Bad Thing.

Therefore, we’ve been looking for a way to assess the impact of such schema and index changes, on the log ship process, before making them in production.

Testing / Measuring

Unfortunately the only reasonable answer appears to be the rather large hammer of recreate a copy of the production system, including log shipping. The databases in question are quite large, so this means two servers, or one server with a bunch of hard drives (to run the log ship source and destination on the same machine).

The process is:

Install a lot of hard drive space, OS, SQL Server
Restore a copy of the production DB, call this “test1”
Set up log shipping to one or two reporting DBs, test2 and test3

Then for each test run:

Run log shipping (so its all up to date)
Run test queries (like adding a field, changing a field type, changing a large index, etc)
Run log shipping again, and observe how big of a file is log shipped between the two DBs.

The main metric we’ll get out of this is for change X (i.e. for a certain SQL DDL change or DML change), we get a log ship of size N megabytes or N gigabytes. I suspect that with this kind of data in hand, we will soon discover the underlying “rules” and understand which changes result in very large logs, so most of the time we can tell which things are going to have a big impact, and schedule them around appropriately.

We can even automate the test process: feed in a piece of SQL to a program runs those steps. It might take hours of (cheap) computer time to run, but little (expensive) human time.

The Plot Thickens

What I’ve described above is the simple case. There’s a more complex case: we’ve observed that the worst delays tend to happen if a log backup occurs in the middle of a certain operation – even operations that are usually harmless, can perhaps result in a huge log ship if the log backup happens mid-operation. These also appear to be the logs that take especially long to restore on the recording on the destination database.

To test this, we can take a piece of DML that we think is going to take several minutes to run, start it, wait 30 seconds or so, and while it’s still running start a log backup. Then wait until the DML completes and start another log backup. We would gain several data points: the size of the mid-operation ship and the final data ship, the how long each takes to restore. I suspect we will learn that it’s a really bad idea to let a log backup start while running any potentially large operation.

To prevent that, perhaps we can automate these operations like so:

Run a snippet of SQL to disable log backups (log shipping)
Wait for any running backup to finish
Run the target SQL
Wait for it to finish
Reenable log backups (log shipping)

That’s as much detail as I have time to post; hopefully this will help someout out there with SQL Server log shipping problems. It would be great to hear from others out there who have experienced similar log shipping issues.

Going to RailsConf. Be wary of software religion.

I’m going to RailsConf 2006 in Chicago next month:

The interesting thing about this is that I signed up before I knew how popular it would be – it sold out in a few days, so the pent-up demand must have been remarkable. More remarkable is that it sold out long before the full list of speakers and talks was released (or even existed). Me and 500 other people didn’t even need to know what/who would be there, to decide to go. For me, that is because I have a project going that uses Rails, and I generally enjoy software conferences, and this one is close.

But maybe there is more. Is Ruby+Rails a fad? A cult? I nearly always enjoy the Ruby on Rails Podcast, but its intro music proudly and bizarrely claims that “we’re building a religion”.

I don’t know who “we” are, but I’m not on board with that.

Assembly code from 1994, proto-DSL

Tonight I came across a chunk of x86 assembly code that I wrote for a university class in 1994. I present it here in its original form, complete with 1994 file modification date:

KSPOOL.zip

The thing I notice about this code in retrospect is that I used a macro (this was assembled with a macro assemler) to make some menu-key-dispatch code succinct and declarative in its appearance. This is quite low-level but in a sense not all that different from the “domain specific language” idea which has grown popular recently.

LVM (Logical Volume Management) – a very good thing

This Mailing list post from a user of “MythTV” reminded me of the wonders of LVM (Logical Volume Management) which is built in to Linux (HowTo, Resource page). I first saw LVM years ago on a commercial Unix, and didn’t quite understand the point. Now I see the point clearly, and set up most new machines with LVM.

If you’re not using LVM, and you have non-trivial storage hardware (more than one hard drive), now is the time to start.

On an OS without LVM built in (as I am, on my Windows machines..), it is often necessary to do things offline, with external tools (like Partition Magic) that could be done online in a running system with LVM.

Basement Data Center

It occured to me today that a section of my basement is starting to resemble a data center:

Data Center

The four machines here:

Are all test machines, or being configured and burned in for future deployment – the production hardware for our projects invariably ends up in customers’ data centers or in robust colo facilities (redundant power, massive air conditioning, redundant “tier one” network connectivity, etc.)
All use AMD CPUs – two of them have dual-core Athlon64 X2’s.
Are all black – black is still the default color, it appears.

Other tidbits:

The leftmost machine is a “Shuttle XPC”.
The shelves are homemade and very strong.
Three of the machine run only Linux; the other dual-boots to Windows XP.
Very little work happens on these machine at the physical console – they are accessed via SSH, VNC, Terminal Services, etc.
There is a dedicated 15 amp circuit to run these machines.
The reason they are here, in my basement, is so that the hardware and OS can be configured. When one becomes wedged it can easily be rebooted or diagnosed at the terminal; that happens all to often when, for example, trying a variety of Linux kernels to get virtualuzation working .