linux – Page 3 – Kyle Cordes

Network / System Monitoring Smorgasbord

At one of my firms (a Software as a Service provider), we have a Zabbix installation in place to monitor our piles of mostly Linux servers. Recently we look a closer look at it and and found ample opportunities to monitor more aspects, of more machines and device, more thoroughly. The prospect of increased investment in monitoring led me to look around at the various tools available.

The striking thing about network monitoring tools is that there are so many from which to choose. Wikipedia offers a good list, and the comments on a Rich Lafferty blog post include a short introduction from several of the players. (Update – Jane Curry offers a long and detailed analysis of network / system monitoring and some of these tools (PDF).)

For OS level monitoring (CPU load, disk wait time, # of processes waiting for disk, etc.), Linux exposes extensive information with “top”, “vmstat”, “iostat”, etc. I was disappointed to not find any of these tools conveniently presenting / aggregating / graphing the data therein. From my short look, some of the tools offer small subsets of that data; for details, they offer the ability for me to go in and figure out myself what data I want in and how to get it. Thanks.

Network monitoring is a strange marketplace; many of the players have a very similar open source business model, something close to this:

core app is open source
low tier commercial offering with just a few closed source addons, and support
high tier commercial offering with more closed source addons, and more support

I wonder if any of them are making any money.

Some of these tools are agent-based, others are agent-less. I have not worked with network monitoring in enough depth to offer an informed opinion on which design is better; however, I have worked with network equipment enough to know that it’s silly not to leverage SNMP.
I spent yesterday looking around at some of the products on the Wikipedia list, in varying levels of depth. Here I offer first impressions and comments; please don’t expect this to be comprehensive, nor in any particular order.

Zabbix

Our old installation is Zabbix 1.4; I test-drove Zabbix 1.6 (advertised on the Zabbix site as “New look, New touch, New features”. The look seemed very similar to 1.4, but the new feature list is nice.

We most run Ubuntu 8.04, which offers a package for Zabbix 1.4. Happily, 8.04 packages for Zabbix 1.6 are available at http://oss.travelping.com/trac.

The Zabbix agent is delightfully small and lightweight, easily installing with a Ubuntu package. In its one configuration file, you can tell it how to retrieve additional kinds of data. It also offers a “sender”, a very small executable that transmits a piece of application-provided data to your Zabbix server.

I am reasonably happy with Zabbix’s capabilities, but I have the GUI design to be pretty weak, with lots of clicking to get through each bit of configuration. I built far better GUIs in the mid-90s with far inferior tools to what we have today. Don’t take this as an attack on Zabbix in particular though; I have the same complaint about most of the other tools here.

We run PostgreSQL; Zabbix doesn’t offer any PG monitoring in the box, but I was able to follow the tips at http://www.zabbix.com/wiki/doku.php?id=howto:postgresql and get it running. This monitoring described there is quite high-level and unimpressive, though.

Hyperic

I was favorably impressed by the Hyperic server installation, which got two very important things right:

It included its own PostgreSQL 8.2, in its own directory, which it used in a way that did not interfere with my existing PG on the machine.
It needed a setting changed (shmmax), which can only be adjusted by root. Most companies faced with this need would simply insist the installer run as root. Hyperic instead emitted a short script file to make the change, and asked me to run that script as root. This greatly increased my inclination to trust Hyperic.

Compared to Zabbix, the Hyperic agent is very large: a 50 MB tar file, which expands out to 100 MB and includes a JRE. Hyperic’s web site says “The agent’s implementation is designed to have a compact memory and CPU utilization footprint”, a description so silly that it undoes the trust built up above. It would be more honest and useful of them to describe their agent as very featureful and therefore relatively large, while providing some statistics to (hopefully) show that even its largish footprint is not significant on most modern servers.

Setting all that aside, I found Hyperic effective out-of-the-box, with useful auto-discovery of services (such as specific disk volumes and software packages) worth monitoring, it is far ahead of Zabbix in this regard.

For PostgreSQL, Hyperic shows limited data. It offers table and index level data for PG up through 8.3, though I was unable to get this to work, and had to rely on the documentation instead for evaluation. This is more impressive at first glance than what Zabbix offers, but is still nowhere near sufficiently good for a substantial production database system.

Ganglia

Unlike the other tools here, Ganglia comes from the world of high-performance cluster computing. It is nonetheless apparently quite suitable nowadays for typical pile of servers. Ganglia aims to efficiently gather extensive, high-rate data from many PCs, using efficient on-the-wire data representation (XDR) and networking (UDP, including multicast). While the other tools typically gather data at increments of once per minute, per 5 minutes, per 10 minutes, Ganglia is comfortable gathering many data points, for many servers, every second.

The Ganglia packages available in Ubuntu 8.04 are quite obsolete, but there are useful instructions here to help with a manual install.

Nagios

I used Nagios briefly a long time ago, but I wasn’t involved in the configuration. As I read about all these tools, I see many comments about the complexity of configuring Nagios, and I get the general impression that it is drifting in to history. However, I also get the impression that its community is vast, with Nagios-compatible data gathering tools for any imaginable purpose.

Others

How Many Monitoring Systems Does One Company Need?

It is tempting to use more than one monitoring system, to quickly get the logical union of their good features. I don’t recommend this, though; it takes a lot of work and discipline to set up and operate a monitoring system well, and dividing your energy across more than one system will likely lead to poor use of all of them.

On the contrary, there is enormous benefit to integrated, comprehensive monitoring, so much so that it makes sense to me to replace application-specific monitors with data feeds in to an integrated system. For example, in our project we might discard some code that populates RRD files with history information and published graphs, and instead feed this data in to a central monitoring system, using its off-the-shelf features for storage and graphing.

A flip side of the above is that as far as I can tell, none of these systems offers detailed DBA-grade database performance monitoring. For our PostgreSQL systems, something like pgFouine is worth a look.

Conclusion

I plan to keep looking and learning, especially about Zenoss and Ganglia. For the moment though, our existing Zabbix, upgraded to the current version, seems like a reasonable choice.

Comments are welcome, in particular from anyone who can offer comparative information based on substantial experience with more than one of these tools.

RocketModem Driver Source Package for Debian / Ubuntu

A couple of months ago I posted about using the current model Comtrol RocketModem IV with Debian / Ubuntu Linux. Ubuntu/Debian includes an older “rocket” module driver in-the-box, which works well for older RocketModem IV cards. But for the newest cards, it does not work at all. The current RocketModem IV is not recognized by the rocket module in-the-box in Linux, it requires an updated driver from Comtrol.

With some work (mostly outsourced to a Linux guru) I now present a source driver package for the 3.08 “beta” driver version (from the Comtrol FTP site):

comtrol-source_3.08_all.deb

Comtrol ships the driver source code under a GPL license, so unless I badly misunderstand, it’s totally OK for me to redistribute it here.

To install this, you follow the usual Debian driver-build-install process. The most obvious place to do so is on the hardware where you want to install it, but you can also use another machine the same distro version and Linux kernel version as your production hardware. Some of these commands must be run as root.

dpkg -i comtrol-source_3.08_all.deb

module-assistant build comtrol

This builds a .deb specific to the running kernel. When I ran it, the binary .deb landed here:

/usr/src/comtrol-module-2.6.22-14-server_3.08+2.6.22-14.52_amd64.deb

Copy to production hardware (if you are not already on it), then install:

dpkg -i /usr/src/comtrol-module-2.6.22-14-server_3.08+2.6.22-14.52_amd64.deb

and verify the module loads:

modprobe rocket

and finds the hardware:

ls /dev/ttyR*

To verify those devices really work (that they talk to the modems on your RocketModem card), Minicom is a convenient tool:

apt-get install minicom

minicom -s

Kernel Versions

Linux kernel module binaries are specific to the kernel version they are built for; this is an annoyance, but is not considered a bug (by Linus). Because of this, when you upgrade your kernel, you need to:

Uninstall the binary kernel module .deb you created above
Put in the new kernel
Build and install a new binary module package as above

Rebuilding the source .deb

Lastly, if you care to recreate the source .deb, you can do so by downloading the “source” for the source deb: comtrol-source_3.08.tar.gz then following a process roughly like so:

apt-get install module-assistant debhelper fakeroot

m-a prepare

tar xvf comtrol-source_3.08.tar.gz

cd comtrol-source

dpkg-buildpackage –rfakeroot

The comtrol subdirectory therein contains simply the content of Comtrol’s source driver download, and this is likely to work trivially with newer driver versions (3.09, etc.) when they appear.

So, you want to use your new RocketModem IV on Linux

On one of our projects, we’ve been using the Comtrol RocketModem IV for several years, for both modem communications and FAXing (with Hylafax). All of our RMIVs have been completely reliable and very easy to get working under Linux, particular Ubuntu/Debian which includes the rocket driver in-the-box.

Then we got a new card; it looks like the old cards; but according to the rocket driver it does not exist.

We initially suspected a bad card. We were unable to test with the floppy-image diagnostics on the Comtrol web site, because like most of the rest of the world we stopped ordering floppy drives in new machines several years ago. Comtrol support helpfully pointed us at a CD .ISO for the diagnostics, in here:

ftp://ftp.comtrol.com/contribs/rocketport/diagnostics/rocketmodem/

…without mentioning any reason why this information would not be on their support web site in 2008.

The diagnostics showed a 100% functional card. It finally dawned on me that I might need an updated driver for the current card; and this turned out to be true. The current RocketModem IV is not recognized at all by the rocket module in-the-box in Linux.

Getting it going was an adventure; the short version is that to get the current RocketModem IV to work on any vaguely current Linux kernel, you need to poke around the Comtrol web site and find the beta 3.08 driver, in here:

ftp://ftp.comtrol.com/beta/rmodem/drivers/u_pci/linux/

then do the usual apt-get of a build environment, make, sudo make install, /etc/init.d/rocket start.

Production Deployment

Of course the instructions above are for development / test usage; it is a very poor practice to do the above installation on production hardware, because it makes no accommodation for the inevitable stream of upgrades coming through the packaging mechanism, and because in most cases sysadmins prefer to not have a build environment on production hardware at all (ever).

I am no packaging / Debian expert, but I can see two workable paths to good production deployment for kernel modules like this:

1) Use checkinstall or similar mechanism to create a package which installs the binary modules; set things up so that the resulting package depends on the proper kernel package version. As new kernel packages appear, make new package versions to accommodate them, and likewise for each architecture (i386, amd64).

2) recast the raw source .tar file, as a source-deb package suitable for use with the Debian module-assistant mechanism.

I’d love to hear from any Debian/Ubuntu experts on the relative merits of these approaches.

Perhaps in a few years, things will settle back to the previous bliss of a working driver in the Linux “box”.

Update: I’ve posted a .deb for this here.

A Brief Introduction to Distributed Version Control

Last night at SLUUG, I have a talk on distributed source control tools. It was quite introductory, but the notes (below) may still be helpful. These notes were on a handout at the talk, as usual I didn’t use slides.

Unfortunately I didn’t get an audio recording of this talk, so no transcript either.

About 30 people were in attendance. Nearly 100% were familiar with CVS and SVN, and perhaps 20% with other tools (ClearCase, SourceSafe, and others). Only 4 had ever used branch/merge in any project or tool! Continue reading “A Brief Introduction to Distributed Version Control”

YouTube Scalability Talk

Cuong Do of YouTube / Google recently gave a Google Tech Talk on scalability.

I found it interesting in light of my own comments on YouTube’s 45 TB a while back.

Here are my notes from his talk, a mix of what he said and my commentary:

In the summer of 2006, they grew from 30 million pages per day to 100 million pages per day, in a 4 month period. (Wow! In most organizations, it takes nearly 4 months to pick out, order, install, and set up a few servers.)

YouTube uses Apache for FastCGI serving. (I wonder if things would have been easier for them had they chosen nginx, which is apparently wonderful for FastCGI and less problematic than Lighttpd)

YouTube is coded mostly in Python. Why? “Development speed critical”.

They use psyco, Python -> C compiler, and also C extensions, for performance critical work.

They use Lighttpd for serving the video itself, for a big improvement over Apache.

Each video hosted by a “mini cluster”, which is a set of machine with the same content. This is a simple way to provide headroom (slack), so that a machine can be taken down for maintenance (or can fail) without affecting users. It also provides a form of backup.

The most popular videos are on a CDN (Content Distribution Network) – they use external CDNs and well as Google’s CDN. Requests to their own machines are therefore tail-heavy (in the “Long Tail” sense), because the head codes to the CDN instead.

Because of the tail-heavy load, random disks seeks are especially important (perhaps more important than caching?).

YouTube uses simple, cheap, commodity Hardware. The more expensive the hardware, the more expensive everything else gets (support, etc.). Maintenance is mostly done with rsync, SSH, other simple, common tools.
The fun is not over: Cuong showed a recent email titled “3 days of video storage left”. There is constant work to keep up with the growth.

Thumbnails turn out to be surprisingly hard to serve efficiently. Because there, on average, 4 thumbnails per video and many thumbnails per pages, the overall number of thumbnails per second is enormous. They use a separate group of machines to serve thumbnails, with extensive caching and OS tuning specific to this load.

YouTube was bit by a “too many files in one dir” limit: at one point they could accept no more uploads (!!) because of this. The first fix was the usual one: split the files across many directories, and switch to another file system better suited for many small files.

Cuong joked about “The Windows approach of scaling: restart everything”

Lighttpd turned out to be poor for serving the thumbnails, because its main loop is a bottleneck to load files from disk; they addressed this by modifying Lighttpd to add worker threads to read from disk. This was good but still not good enough, with one thumbnail per file, because the enormous number of files was terribly slow to work with (imagine tarring up many million files).

Their new solution for thumbnails is to use Google’s BigTable, which provides high performance for a large number of rows, fault tolerance, caching, etc. This is a nice (and rare?) example of actual synergy in an acquisition.

YouTube uses MySQL to store metadata. Early on they hit a Linux kernel issue which prioritized the page cache higher than app data, it swapped out the app data, totally overwhelming the system. They recovered from this by removing the swap partition (while live!). This worked.

YouTube uses Memcached.

To scale out the database, they first used MySQL replication. Like everyone else that goes down this path, they eventually reach a point where replicating the writes to all the DBs, uses up all the capacity of the slaves. They also hit a issue with threading and replication, which they worked around with a very clever “cache primer thread” working a second or so ahead of the replication thread, prefetching the data it would need.

As the replicate-one-DB approach faltered, they resorted to various desperate measures, such as splitting the video watching in to a separate set of replicas, intentionally allowing the non-video-serving parts of YouTube to perform badly so as to focus on serving videos.

Their initial MySQL DB server configuration had 10 disks in a RAID10. This does not work very well, because the DB/OS can’t take advantage of the multiple disks in parallel. They moved to a set of RAID1s, appended together. In my experience, this is better, but still not great. An approach that usually works even better is to intentionally split different data on to different RAIDs: for example, a RAID for the OS / application, a RAID for the DB logs, one or more RAIDs for the DB table (uses “tablespaces” to get your #1 busiest table on separate spindles from your #2 busiest table), one or more RAID for index, etc. Big-iron Oracle installation sometimes take this approach to extremes; the same thing can be done with free DBs on free OSs also.

In spite of all these effort, they reached a point where replication of one large DB was no longer able to keep up. Like everyone else, they figured out that the solution database partitioning in to “shards”. This spread reads and writes in to many different databases (on different servers) that are not all running each other’s writes. The result is a large performance boost, better cache locality, etc. YouTube reduced their total DB hardware by 30% in the process.

It is important to divide users across shards by a controllable lookup mechanism, not only by a hash of the username/ID/whatever, so that you can rebalance shards incrementally.

An interesting DMCA issue: YouTube complies with takedown requests; but sometimes the videos are cached way out on the “edge” of the network (their caches, and other people’s caches), so its hard to get a video to disappear globally right away. This sometimes angers content owners.

Early on, YouTube leased their hardware.

Linus Torvalds explains distributed source control

On several occasions over the last year, I’ve pointed out that distributed source control tools are dramatically better than centralized tools. It’s quite hard for me to explain why. This is probably because of sloppy and incomplete thinking on my part, but it doesn’t help that most of the audiences / people I’ve said this to, have never used a distributed tool. (I’ve been trying out SVK, bzr, git, etc.) Fortunately, I no longer need to stumble with attempts at explaining this; instead, the answer is to watch as:

Linus Torvalds explains distributed source control in general, and git in particular, at Google

Here are some of his points, paraphrased. They might not be clear without watching the video.

He hates CVS
He hates SVN too, since it’s “CVS done right”, because you can’t get anywhere good from there.
If you need a commercial tool, BitKeeper is the one you should use.
Distributed source control is much more important than which tool you choose
He looked at a lot of alternatives, and immediately tossed anything not distributed, slow, or which does not guarantee that what goes in, comes out [using secure hashes]
He liked Monotone, but it was too slow
With a distributed tool, no single place is vital to your data
Centralized does not scale to Linux-kernel sized projects
Distributed tools work offline, with full history
Branching is not an esoteric, rare event – everyone branches all the time every time they write a line of code – but most tools don’t understand this, they only understand branching as a Very Big Deal.
Distributed tools serve everyone; everyone has “commit” access to their branches. Everyone has all of the tool’s features, rather than only a handful of those with special access.
Of course noone else will necessarily adopt your changes; using a tool in which every developer has the full feature set, does not imply anarchy.
Merging works like security: as a network of trust
Two systems worth looking at: git and Mercurial. The others with good features are too slow [for large projects].
Lots of people are using git for their own work (to handle merges, for example), inside companies where the main tools is SVN. git can pull changes from (and thus stay in sync with) SVN.
Git is now much easier to use than CVS or other common tools
Git makes it easier to merge than other tools. So the problem of doing more merging, is not much of a problem, not something you need to fear.
Distributed tools are much faster because they don’t have to go over the network very often. Not talking to a server, is tremendously faster, than talking to even a high end server over a fast network.
SVN working directories and repositories are quite large, git equivalents are much smaller
The repository (=project) is the unit of checkout / commit / etc.; don’t put them all in to on repository. The separate repositories can share underlying storage.
Performance is not secondary. It affect everything you do, it affects how you use an application. Faster operations = merging not a big deal = more, smaller changes.
SVN “makes branching really cheap”. Unfortunately, “merging in SVN is a complete disaster”. “It is incredible how stupid these people are”
Distributed source control is mostly inherently safer: no single point of failure, secure hashes to protect from even intentional malicious users.
“I would never trust Google to maintain my source code for me” – with git (and other distributed systems) the whole history is in many places, nearly impossible to lose it.

My own observations:

There are important differences between source control tools. I have heard it said that these are all “just tools” which don’t matter, you simply use whatever the local management felt like buying. That is wrong: making better tool choices will make your project better (cheaper, faster, more fun, etc.), making worse tool choices will make your project worse (more expensive, slower, painful, higher turnover, etc.)

Distributed tools make the “network of trust” more explicit, and thus easier to reason about.

I think there is an impression out there that distributed tools don’t accomodate the level of control that “enterprise” shops often specify in their processes, over how code is managed. This is a needless worry; it is still quite possible to set up any desired process and controls for the the official repositories. The difference is that you can do that, without the collateral damage of taking away features from individual developers.

Git sounds like the leading choice on Linux, but at the moment it is not well supported on Windows, and I don’t see any signs of that changing anytime soon. I will continue working with bzr, SVK, etc.

There is widespread, but mostly invisible demand for better source control. Over the next few years, the hegemony of the legacy (centralized) design will be lessened greatly. One early adopter is Mozilla, which is switching (or has switched?) to Mercurial. Of course, many projects and companies (especially large companies) will hang on for years to come, because of interia and widespread tool integration.

Several of the distributed tools (SVK, bzr, git, probably more) have the ability to pull changes from other systems, including traditional centralized systems like SVN. These make it possible to use a modern, powerful tool for your own work, even when you are working on a project where someone else has chosen such a traditional system for the master repository.

Update: Mark commented that he doesn’t feel like he has really commited a changeset, until it is on a remote repository. I agree. However, this is trivial: with any of the tools you can push your change from your local repository to a remote one (perhaps one for you alone), with one command. With some of them (bzr, at least) you can configure this to happen automatically, so it is zero extra commands. This does not negate the benefits of a local repository, because you can still work offline, and all read operations still happen locally and far more quickly.

Update: I didn’t mention Mercurial initially, but I should have (thanks, Alex). It is another strong contender, and Mozilla recently chose it. Its feature set is similar to git’s, but with more eager support for Windows.

Update: There is another discussion of Linus’s talk over on Codice Software’s blog, which was linked on Slashdot. Since Codice sells a source control product, the Slashdot coverage is a great piece of free publicity.

Update: Mark Shuttleworth pointed out below that bzr is much faster than it used to be, and that it has top-notch renaming support; which he explains in more detail on in a post. Bzr was already on my own short-list, I am using it for a small project here at Oasis Digital.

Update: I’ve started gathering notes about git, bzr, hg, etc., for several upcoming posts (and possibly a talk or two). If you’re interested in these, subscribe to the feed using the links in the upper right of my home page.