Hire a RAIT: Redundant Array of Independent Teams

Life is Risk

Whenever you hire out work, either to a person, to a team, or to a company, there are risks. These risks can easily prevent the work from being completed, and even more easily prevent it from being completed on time. (I’m thinking mostly of software development work as I write this, but most of this applies to other domains as well.)

What could go wrong with the person/team/company you hire?

  • They get distracted by family or personal issues.
  • They turn out to not be as qualified or capable as they appeared.
  • They leave for better work. Sure, you might have a contract requiring them to finish, but your lawsuit won’t get the work done on time.
  • They turn out to not be as interested in your work as they first appeared.
  • They start with an approach which, while initially appearing wise, turns out to be poorly suited.
  • Illness or injury.

Of course you should carefully interview and check reputations to avert some of these risks, but you cannot make them all go away. You don’t always truly know who is good, who will produce. You can only estimate, with varying levels of accuracy. The future is unavoidably unknown and uncertain.

But you still want the work done, sufficiently well and sufficently soon. Or at least I do.

Redundancy Reduces Risk

A few years ago I stumbled across a way to attack many of these risks with the same, simple approach: hire N people or teams in parallel to separately attack the same work. I sometimes call this a RAIT, a Redundant Array of Independent Teams. Both the team size (one person or many), and the number of teams (N) can vary. Think of the normal practice of hiring a single person or single team as a degenerate case of RAIT with N=1.

To make RAIT practical, you need a hiring and management approach that uses your time (as the hirer) very efficiently. The key to efficiency here is to avoid doing things N times (once per team); rather, do them once, and broadcast to all N teams. For example, minimize cases where you answer developer questions in a one-off way. If you get asked a question by phone, IM, or email, answer it by adding information to a document or wiki; publish the document or wiki to all N teams. If you don’t have a publishing system or wiki technology in hand, in many cases simply using a shared Google Document is sufficient.

There are plenty of variations on the RAIT theme. For example, you might keep the teams completely isolated in terms of their interim work; this would minimize the risk that one teams’ bad ideas will contaminate the others. Or you might pass their work back and forth from time to time, since this would reduce duplicated effort (and thus cost) and speed up completion.

Another variation is to start with N teams, then incrementally trim back to a single team. For example, consider a project that will take 10 weeks to complete. You could start with three concurrent efforts. After one week, drop one of the efforts – whichever has made the least progress. After three weeks, again drop whichever team has made the least progress, leaving a single team to work all 10 weeks. As you can see in the illustration below, the total cost of this approach is 14 team-weeks of work.

How might you think about that 14 team-weeks of effort/cost?

  1. It is a 40% increase in cost over picking the right team the first time. If you can see the future, you don’t need RAIT.
  2. It is a 50% decrease compared to paying one team for 10 weeks, realizing they won’t produce, then paying another team for 10 more weeks.
  3. If you hired only one team, which doesn’t deliver on time, you might miss a market opportunity.

Still, isn’t this an obvious waste of money?

To understand the motivation here, you must first understand (and accept) that no matter how amazing your management, purchasing, and contracting skills, there remains a significant random element in the results of any non-trivial project. There is a range of possibilities, a probability function describing the likelihood with which the project will be done as a function of time.

RAIT is not about minimizing best-case cost. It is about maximizing the probability of timely, successful delivery:

  • To reduce the risk of whether your project will be delivered.
  • To reduce the risk of whether your project will be delivered on time.
  • To increase your aggregate experience (as you learn from multple teams) faster.
  • To enable bolder exploration of alternative approaches to the work.

What projects are best suited for RAIT?

Smaller projects have a lower absolute cost of duplicate efforts, so for these it is easier to consider some cost duplication. RAIT is especially well suited when hiring out work to be done “out there” by people scattered around the internet and around the world, because the risk of some of the teams/people not engaging effectively in the work is typically higher.

Very important projects justify the higher expense of RAIT. You could think of high-profile, big-dollar government technologies development programs as an example of RAIT: a government will sometimes pay two firms to developing different designs of working prototype aircraft, then choose only one of them for volume production. For a smaller-scale example, consider the notion of producing an iPhone app or Flash game for an upcoming event, where missing the date means getting no value at all for your efforts.

Thanks to David McNeil for reviewing a draft of this.

If you like it, make a link to it – a plea for real links

You see something good on the web; now it’s time to tell other people about it. Maybe you’ll use various common tools:

  • Facebook “like” it
  • Social-network-share it
  • Bit.ly it
  • Tweet it
  • Mention it in a forum post
  • Mention it in a blog comment

I believe it’s smart and convenient to do those things, but not to only do those things. Why? Because they create redirected, tracked, short-lived, rel=nofollowed, or otherwise weak links. Links that don’t properly tell search engines that the content is worthwhile. Quasi-links that attempt to replace real links as the fundamental currency of the web.

If you really like it, if you think it deserves ongoing attention, then in addition to whatever else you do, put a real A-HREF link to it on your web site/blog.

In the Arena

Almost every day at some point I wander over to Hacker News, which has some great discussion, along with some less great discussion, among people pursuing or aspiring to pursue a software startup or similar business. Likewise with local events (like ITEN STL offers), and even more so the Business of Software conference earlier this month. (experiences)

I used to have a software product business myself, a vertical market SaaS firm. Now that I’ve been out of that for over a year, the thing I miss most is the feeling of being “in the arena”, of having a speculative product out there for people to buy. To be out there is both terrifying and exhilarating. I have heard it said that there are “product people” and “consulting people”, and looking back it is clear to me that I am mostly in the Product category.

Unlike some product people (like Amy Hoy, whom I admire greatly!) I don’t think it’s necessary to swear off one thing to do the other. Consulting (building software for clients) is very satisfying, especially when working with a team of great people (and a group of very competent customers) like we have at Oasis Digital.

So while I’m going to keep building software for other people, I’m also going to go back to the marketplace with speculative products. This time it will be products in the plural, some subset of:

  • Web/SaaS software
  • iPad software
  • iPhone / iPod Touch software
  • Android software (by year-end the stores will be piled high with Android tablets)
  • Or possibly HTML5/etc software to address all of the above
  • Backend / data / system management software
  • Or even, possibly, locally installed desktop software

I apologize for the vagueness of this list; but I agree with Derek Sivers about keeping one’s specific goals to oneself so my voluminous and tedious notes on exactly what products to offer, will remain offline.

October 2010: Business of Software, Strange Loop, Clojure Conj

I attended three conferences in October 2010, the most of any month of my life to date. Others have posted extensively about all three events, so I’ll link to a few posts and point out highlights for me.

Business of Software 2010

BoS alternates between San Franscisco and Boston; this year it was in Boston. There are plenty of excellent summaries online (here, here, here, here), and an especially nice set of photos here.

The conference was packed full of great speakers, mostly well known. I am sure the most “expensive” person in the lineup was Seth Godin; he is an excellent speaker and had interesting content, but wasn’t as relevant to me as some of the others.

The high point of BoS was Joel Spolsky’s closing talk. Unlike everyone else, he used no slides, and simply sat at a table to tell us the story of his last year or so. I was a bit surprised at his public airing of partner grievances, but that was probably necessary to tell the (very worthwhile) story of his transition over the last year from the “small, profitable company” model to the “go big” model. The former can make good money; but only the latter can make a broad impact to build a (perhaps slightly) better world.

I also especially enjoyed Erik Sink and Derek Sivers telling the stories of their company sales. My own company sale experience was more like Erik Sink’s.

In the past, Business of Software has posted the videos for year N during the marketing runup for year N+1; I suspect the same will happen this time. When those videos appear, watch them. Especially keep an eye out for Joel’s criticism of Craigslist, with which I agree.

Strange Loop 2010

Strange Loop is held in, and named after, the Delmar Loop area which spans University City and a bit of St. Louis. The 2010 event was much larger than the 2009 event; I don’t know whether it will be possile to accomodate 2011’s crowd in the Loop area or not; I’ll certainly attend either way.

Again there are plenty of summaries online, including here and here.

The highlight of this event for me was Guy Steele’s talk on parallelism. Unlike some commenters, I greatly enjoyed both the first half of the talk (a stroll through some ancient IBM assembly code) and the second half (including the Fortress example code). I’ve been inspired by this talk and criticism about it to put together my own upcoming code-centric talks, in which I’ll touch on the key parallelism ideas briefly, then step through several code examples in various languages.

I also spoke at Strange Loop, in a 20 minute slot, on Lua (video). Most of the feedback on my talk was positive, particularly of the “why, not how” approach I used to make the best use of 20 minutes. A few people would have preferred a longer talk with more “how”; I might put together such a presentation at a later date.

Disclosure: Oasis Digital sponsored Strange Loop.

(first clojure-conj)

At Clojure Conj I had the strong impression of being at the start of something big. I believe that Clojure, in spite of the needlessly-feared parentheses, has more “legs” than any other of the current crop of ascendant languages: getting state right (and thus making it possible to get parallelism right) is more important than syntax. Based on the folks I met at the Conj, I’d say Clojure has exactly the right early adopters on board.

As usual plenty of others have posted detailed notes (here, here, here, here, here).

The talk that stands out most to me was not exactly about Clojure. Rich Hickey’s keynote was about the importance and process of thinking deeply about problems to create a solution. In a sense this is the counterpoint to agile, rapid-iteration development, suitable to a different class of problems. Clojure exudes a sense of having been thought about in depth, and Rich is obviously the #1 deep thinker. When this arrives on video, watch it. Twice.

I also enjoyed Rich’s impromptu Go clinic at the pre-conference speaker (and sponsor) dinner. Note that Go has totally different rules from the similarly named Go-Moku, and is not to be confused with Google’s Go language.

Disclosure: Oasis Digital sponsored Clojure Conj.

Back to Work

I’ve had very little time for my own projects this month; between the events, most of my available hours were occupied with Oasis Digital customers. My mind is bursting with worthwhile ideas to pursue.

Map-Reduce in the Small: an Array of Talks

At Strange Loop 2010, Guy Steele gave a wide-ranging, excellent talk in which the key point was:

In essence, his notion is to use a divide-and-conquer approach, which he described as “map-reduce in the small” (or some similar phrase). This is analogous to techniques used to partition work in large distributed systems, but inside a single program.

I heartily agree with all of this. Massive multicore will be a dominant factor in software design in the coming decade. In 2010, most of us are happily waiting in a calm before a storm, because our multicore machines don’t have very many cores yet. For most applications, we get by with very coarse parallelism (such as one thread per concurrent user request being served, in an application server or web server). This won’t last – when cheap PCs have 50+ cores, most software will need to harness parallelism in a much more fine-grained way. Allocating only one core per concurrent operation will become ridiculous.

You can download Steele’s slides from the Strange Loop site, or watch this video of his previous talk at ICFP 2009 in which he covered some of the same material. At Strange Loop, Steele showed how to solve a particular problem (counting words in a string) in a manner amenable to parallel processing. His sample code was written in Fortress, a Sun-Oracle research language. Fortress didn’t bother me, but I heard some discussions about the language choice (and rapid presentation) as an obstacle to detailed understanding.

I propose to elaborate these ideas by walking through sample code, in a series of talks.

Talk Proposal: Map-Reduce in the Small

First, I will briefly summarize the need for fine-grained parallelism.

Then, I will present three code walkthrough examples in a widely used language.

  1. The word-count problem from Guy Steele’s Stange Loop talk.
  2. Another simple algorithmic / computer-science-flavored example.
  3. As time allows, a third example from a thoroughly practical, enterprise-app-flavored problem space.

This talk will use very few slides; instead it will be all about the code. Each example will show how (not just why) to parallelize algorithms.

An Array[0..N] of Talks

This idea is worthy of deep understanding and practice, so I have in mind several talks, each using a different language, and with sufficient differences to be interesting even for someone who happens to be present for all of them:

  1. Examples in Java, at the St. Louis Java user group
  2. Examples in Clojure, at the St. Louis Clojure Lunch Cljub (there are already some good Clojure examples out there, making this one easy, but still worthwhile).
  3. Examples in another language (perhaps more esoteric, to be determined later), at Lambda Lounge
  4. A repeat of one of those languages, with updated examples, at Strange Loop 2011 (no web site yet!)

Putting these together will take a while, so I have in mind spreading these over the next year or so. Of course, it is quite possible that only a subset of these groups/events, possibly an empty set, will accept this talk offer. In that case, I’ll take it as a sign that the St. Louis community already understands micro-parallelism in depth, and celebrate!

Update: Schedule

As these talks are scheduled, I will put the information here:

  1. Java code at the St. Louis JUG: Jan 13, 2011.

Lua Doesn’t Suck – Strange Loop 2010 video

At Strange Loop 2010, I gave a 20 minute talk on Lua. The talk briefly covered six reasons (why, not how) to choose Lua for embedded scripting. Lua is safe, fast, simple, easily learned, and more popular that you might expect.

The Strange Loop crew only recorded video in the two largest venues (out of six), so I made a “bootleg” video of my talk, for your viewing pleasure:

video
play-sharp-fill

The video/audio sync starts out OK, but drifts off by a second or so by the end. The drift is minor, so it is reasonably viewable all the way through. If you don’t have Flash installed (and thus don’t see the video above), you can download the video (x264); it plays well on most platforms (including an iPad).

The slides are available for PDF download.


Video Hackery

This video recording was an experiment: instead of hiring a video crew (with professional equipment), or using my DV camcorder, I instead used the video recording capability of my family’s consumer-grade Canon digicam. This device has three advantages over my DV camcorder:

  1. No tape machinery; no motors; thus no motor noise in the audio.
  2. Smaller size, easier to carry in and out.
  3. Directly produces a video file, easily copied off its SD card.

As you can see from the results, the video quality is adequate but not great. Still, I learned that if I want to increase the quality of recording, the first step is not to use a better camera or lens! Rather, it is to bring (or persuade the venue to provide) better light. For good video results, the key is light the speaker well, without shining any extra light on the projector screen. With that in place, a better camera make sense.

The audio was a different story. Like nearly all consumer video cameras (and digicams with video), mine doesn’t have an external audio input, so the audio (from ~12 feet away) was awful. As a backup I had used a $75 audio recorder and a $30 lapel microphone, and that audio is very good, certainly worth using instead of the video recording audio track.

To combine the video in file A with the audio in file B, I used the ffmpeg invocation below. I reached the time adjustments below in just a few iterations of trial and error, by watching the drafts in VLC, using “f” and “g” to experiment with the audio/video time sync. I also trimmed off a bit of the bottom of the video, and used “mp4creator.exe -optimize”, which I had handy on a Windows machine, to prepare the file for progressive download viewing.

ffmpeg -y -ss 34.0 -i WS_10001.WMA -ss 34.0 -itsoffset -12.05 -i MVI_4285.AVI -shortest -t 8000 -vcodec libx264 -vpre normal -cropbottom 120 -b 400k -threads 2 -async 200 Cordes-2010-StrangeLoop-Lua.m4v

The remaining bits of technology are FlowPlayer, a WordPress FlowPlayer plugin, and a CDN.