Kyle Cordes – Page 12 – Software, Business, and Life

Basics: Formatting Numbers for Tabular Display

Here is the first, in hopefully a series, of posts about basic software design. Though not nearly as sexy as a location-based mobile social network with behavioral profiling for ad optimization (probably a real thing), doing basic software design well is one of our “secrets” at Oasis Digital. So here goes.

Today’s topic is how numbers should be formatted for tabular display. This comes up quite frequently in data-centric applications. For an example, consider this mockup of a working-hours-per-person display screen, example 1:

It looks quite nice with the numbers formatted trivially as whole numbers. What if someone worked 4.56 hours, though? How would that appear? To accomodate that possibility, you might always show two digits to the right of the decimal point like so, example 2:

… which is OK, but not great – all those extra zeros distract the viewer from the essence of the data. One thing you should never do, though, is something like this:

This is horrible. If you feel tempted to write software that displays data like this intentionally, please step away from the computer now. This breaks the rules of writing numbers that we all learned in primary school, probably before the age of 10: always line up the decimal point. Instead, for that particular set of numbers, the is reasonable:

So is the answer to always display as many decimal digits as the data could possibly have? Perhaps, but only if you are unable to dynamically change the format based on the contents of the data. Examples 2 and 4 show a safe but unimpressive approach.

If you have the tools and skills to create a high quality solution, aim higher: dynamically choose the right number of decimal digits, to fit the specific data in question, then apply that same format uniformly to all the numbers. The output will therefore look like #1 if the data permits, and like #4 if the data requires it, but will not needlessly fill a column with 0s as in #3. This is more work, but is a more polished, professional result.

Group Programming, Projectors, and Big Screen HDTV

I’ve done a fair amount of pair programming over the years. My “Ward Number” is 1, if anyone recognizes the reference. But we pair only occasionally at Oasis Digital. Like Jeff Atwood, we don’t live the pair programming lifestyle. For our particular mix of people and problem spaces, we’ve found the sufficient amount of pairing is roughly a couple of times per week, a couple hours at a time. We’re a partially distributed team, so this often occurs via screen-sharing tools instead of at the same desk.

However, we do something perhaps even more “extreme” than pair programming: we spend a few hours every week programming in larger groups, sometimes as many as three of us in person and a couple more remotely. Why would anyone do that?

To attack particularly hard problems
To resolve important detailed design issues
To share our programming style and culture
To freshen old skills
To build new skills
To efficiently pass knowledge 1->N, rather than 1->1

I don’t quite know what to call this. Group programming? Cluster programming? N-tuple programming?

Regardless, we encounter an unavoidable issue: it is not pleasant for several people to cram in front of a PC monitor, even a large one, closely enough to read it. We’d rather spread out, particularly for sessions lasting a couple of hours straight.

In the past at other firms I’ve solved this by working in a conference room with a projector. This doesn’t work very well. Most projectors have a maximum native resolution of 1024×768, or occasionally a bit higher, and those with reasonably high resolution are quite expensive. The reward for spending that money is continuous fan noise and an exhaust heat plume blowing on someone sitting on the unlucky side of the table.

This time we went a different direction: our group programming lair features:

40-inch LED-LCD HDTV, with a native resolution of 1920×1080, which generates no noise or heat, at less cost than a mediocre projector
A dedicated computer, so noone’e development laptop is occupied
wireless keyboard and mouse can be easily passed around
Speakers and a standalone mic, for very clear Skype audio
Nearby tables to accomodate everyone’s laptops, with 22/23 inch extra displays available

It isn’t pretty:

It is very effective. We can comfortably work together in this area for hours, easily reading the screen from 6-8 feet away. (As I write this, I note that the room needs better chairs, a less beige color scheme, a printer that isn’t older than my teenager, and more Apple hardware.)

This approach isn’t for everyone; it requires a willingness to move furniture and buy non-standard equipment. I’d love to hear from anyone else doing something similar. In particular, I wonder how it compares to Pivotal’s setup.

Steve Jobs

If you haven’t already done so, now would be a good time
to watch Steve Jobs’ 2005 Stanford commencement address.

Mobile Lua – iOS and Android apps with Corona

On Thursday (May 26, 2011), I presented at the St. Louis Mobile Dev group, on cross-mobile-platform development with Lua. There are various ways to do this (including rolling your own), but for simplicity I used Ansca’s Corona product. The talk was somewhat impromptu, so I didn’t record audio or video. The slides are available as a PDF: 2011-Lua-Corona-Mobile-Dev.pdf

From this blog, you might get the impression that I use Lua extensively. That is not true; 95% of my work does not involve Lua in any way.

Crafting a Summer Intern Program

Inspired by Fog Creek’s summer intern program, for the last few years we’ve (occasionally) thought about a summer intern program at Oasis Digital. We’ll going to try it out this summer, with a single intern; a more substantial program is possible in the future.

Read more at the Oasis Digital blog, where I’ve moved this post.

Cloudy Data Storage, circa 2001

Around 2000-2001, Oasis Digital built a system for a client which (in retrospect) took a “cloudy” approach to data storage. 2001 is a few years before that approach gained popularity, so it’s interesting to look back and see how our solution stacks up.

The problem domain was the storage of check images for banks; the images came out of a check-imaging device, a very specialized camera/scanner capable of photographing many checks per second, front and back. For example, to scan 1000 checks (a smallish run), it generated 2000 images. All of the images from a run were stored in a single archive file, accompanied by index data. OCR/mag-type data was also stored.

I don’t recall the exact numbers (and probably wouldn’t be able to talk about them anyway), so the numbers here are estimates to convey a sense of the scale of the problem in its larger installations:

Many thousands of images per day.
Archive files generally between 100 MB and 2 GB
Hundred, then thousands, of these archive files
In an era when hard drives were much smaller than they are today

Our client considered various off-the-shelf high-capacity storage systems, but instead worked with us to contruct a solution roughly as follows.

Hardware and Networking

Multiple servers were purchased and installed, over time.
Servers were distributed across sites, connected by a WAN.
Multiple hard drives (of capacity C) were installed in each server, without RAID.
Each storage drive on each server was made accessible remotely via Windows networking

Software

To keep the file count managable, the files were kept in the many-image archives.
A database stored metadata about each image, including what file to find it in.
The offset of the image data within its archive file was also stored, so that it could be read directly without processing the whole archive.
Each archive file was written to N different drives, all on different servers, and some at different physical sites.
To pick where to store a new file, the software could simply look through the list of possibility and check for sufficient free space.
A database kept track of where (all) each archive file was stored.
An archive file could be read from any of its locations. Client software would connect to the database, learn of all the locations for a file.

This system was read-mostly, and writes were not urgent. For writes, if N storage drives weren’t available, the operator (of the check-scanning system) would try again later. CAP and other concerns weren’t important for this application.

Helpful Properties

Even if some servers, sites, or links were down, files remained generally accessible.
Offline media storage could be added, though I don’t recall if we got very far down that path.
The system was very insensitive to details like OSs, OS versions, etc. New storage servers and drives could be added with newer OS versions and bigger drive sizes, without upgrading old storage.
Drives could be made read-only once full, to avoid whole classes of possible corruption.
By increasing the number of servers, and number of hard drives over time, this basic design could scale quite far (for the era, anyway).

This approach delivered for our client a lot of the benefits of an expensive scalable storage system, at a fraction of the cost and using only commodity equipment.

Why do I describe this as cloud-like? Because from things I’ve read, this is similar (but much less sophisticated, of course) to the approach taken inside of Amazon S3 and other cloud data storage systems/services.

Key Lesson

Assume you are willing to pay to store each piece of data on N disks. You get much better overall uptime (given the right software) if those N disks are in N different machines spread across sites, than you do by putting those N disks in a RAID on the same machine. Likewise, you can read a file much faster from an old slow hard drive in the same building than you can from a RAID-6 SAN across an 2000-era WAN. The tradeoff is software complexity.