Feb 19 2011

Fix timestamps after a mass file transfer

Published under Technology

I recently transferred a few thousand files, totalling gigabytes, from one computer to another over a slowish internet connection. At the end of the transfer, I realized the process I used had lost all the original file timestamps. Rather, all the files on the destination machine had a create/modify date of when the transfer occurred. In this particular case I had uploaded files to Amazon S3 from end then downloaded them from another, but there are numerous other ways to transfer files that lose the timestamps; for example, many FTP clients do so by default.

This file transfer took many hours, so I wasn’t inclined to delete and try again with a better (timestamp-preserving) transfer process. Rather, it shouldn’t be very hard to fix them in-place.

Both machines were Windows servers; neither had a broad set of Unix tools installed. If I had those present, the most obvious solution would be a simple rsync command, which would fix the timestamps without retransferring the data. But without those tools present, and with an unrelated desire to keep these machines as “clean” as possible, plus a firewall obstacle to SSH, I looked elsewhere for a fix.

I did, however, happen to have a partial set of Unix tools (in the form of the MSYS tools that come with MSYSGIT) on the source machine. After a few minutes of puzzling, I came up with this approach:

  1. Run a command on the source machine
  2. … which looks up the timestamp of each file
  3. … and stores those in the form of batch file
  4. Then copy this batch file to the destination machine and run it.

Here is the source machine command, executed at the top of the file tree to be fixed:

find . -print0 | xargs -0 stat -t "%d-%m-%Y %T"
 -f 'nircmd.exe setfilefoldertime "%N" "%Sc" "%Sm"'
 | tr '/' '\\' >~/fix_dates.bat

I broken it up to several lines here, but it’s intended as one long command.

  • “find” gets the names of every file and directory in the file tree
  • xargs feeds these to the stat command
  • stat gets the create and modify dates of each file/directory, and formats the results in a very configurable way
  • tr converts the Unix-style “/” paths to Windows-style “\” paths.
  • The results are redirected to (stored in) a batch file.

As far as I can tell, the traditional set of Windows built in command line tools does not include a way to set a file or directory’s timestamps. I haven’t spent much time with Powershell yet, so I used the (very helpful) NIRCMD command line utilities, specifically the setfilefoldertime subcommand. The batch file generated by the above process is simply a very long list of lines like this:

nircmd.exe setfilefoldertime "path\filename" "19-01-2000 04:50:26" "19-01-2000 04:50:26"

I copied this batch file to the destination machine and executed it; it corrected the timestamps, the problem was solved.

2 responses so far

Dec 06 2010

New site: Learn Clojure

Published under Technology

Over the last few days I put together Learn-Clojure.com, a web site to help people get started with Clojure. Please take a look, and send feedback.

I also have several other ideas for informational sites and simple applications, which I’ll launch as time allows. In the past I’ve been inclined to just post new things here on my blog, but I think certain kinds of more “evergreen” information are more useful on standalone sites. Certainly the hosting/domain economics are such that it’s not a big deal to put them there.

2 responses so far

Nov 16 2010

Hire a RAIT: Redundant Array of Independent Teams

Published under Business

Life is Risk

Whenever you hire out work, either to a person, to a team, or to a company, there are risks. These risks can easily prevent the work from being completed, and even more easily prevent it from being completed on time. (I’m thinking mostly of software development work as I write this, but most of this applies to other domains as well.)

What could go wrong with the person/team/company you hire?

  • They get distracted by family or personal issues.
  • They turn out to not be as qualified or capable as they appeared.
  • They leave for better work. Sure, you might have a contract requiring them to finish, but your lawsuit won’t get the work done on time.
  • They turn out to not be as interested in your work as they first appeared.
  • They start with an approach which, while initially appearing wise, turns out to be poorly suited.
  • Illness or injury.

Of course you should carefully interview and check reputations to avert some of these risks, but you cannot make them all go away. You don’t always truly know who is good, who will produce. You can only estimate, with varying levels of accuracy. The future is unavoidably unknown and uncertain.

But you still want the work done, sufficiently well and sufficently soon. Or at least I do.

Redundancy Reduces Risk

A few years ago I stumbled across a way to attack many of these risks with the same, simple approach: hire N people or teams in parallel to separately attack the same work. I sometimes call this a RAIT, a Redundant Array of Independent Teams. Both the team size (one person or many), and the number of teams (N) can vary. Think of the normal practice of hiring a single person or single team as a degenerate case of RAIT with N=1.

To make RAIT practical, you need a hiring and management approach that uses your time (as the hirer) very efficiently. The key to efficiency here is to avoid doing things N times (once per team); rather, do them once, and broadcast to all N teams. For example, minimize cases where you answer developer questions in a one-off way. If you get asked a question by phone, IM, or email, answer it by adding information to a document or wiki; publish the document or wiki to all N teams. If you don’t have a publishing system or wiki technology in hand, in many cases simply using a shared Google Document is sufficient.

There are plenty of variations on the RAIT theme. For example, you might keep the teams completely isolated in terms of their interim work; this would minimize the risk that one teams’ bad ideas will contaminate the others. Or you might pass their work back and forth from time to time, since this would reduce duplicated effort (and thus cost) and speed up completion.

Another variation is to start with N teams, then incrementally trim back to a single team. For example, consider a project that will take 10 weeks to complete. You could start with three concurrent efforts. After one week, drop one of the efforts – whichever has made the least progress. After three weeks, again drop whichever team has made the least progress, leaving a single team to work all 10 weeks. As you can see in the illustration below, the total cost of this approach is 14 team-weeks of work.

How might you think about that 14 team-weeks of effort/cost?

  1. It is a 40% increase in cost over picking the right team the first time. If you can see the future, you don’t need RAIT.
  2. It is a 50% decrease compared to paying one team for 10 weeks, realizing they won’t produce, then paying another team for 10 more weeks.
  3. If you hired only one team, which doesn’t deliver on time, you might miss a market opportunity.

Still, isn’t this an obvious waste of money?

To understand the motivation here, you must first understand (and accept) that no matter how amazing your management, purchasing, and contracting skills, there remains a significant random element in the results of any non-trivial project. There is a range of possibilities, a probability function describing the likelihood with which the project will be done as a function of time.

RAIT is not about minimizing best-case cost. It is about maximizing the probability of timely, successful delivery:

  • To reduce the risk of whether your project will be delivered.
  • To reduce the risk of whether your project will be delivered on time.
  • To increase your aggregate experience (as you learn from multple teams) faster.
  • To enable bolder exploration of alternative approaches to the work.

What projects are best suited for RAIT?

Smaller projects have a lower absolute cost of duplicate efforts, so for these it is easier to consider some cost duplication. RAIT is especially well suited when hiring out work to be done “out there” by people scattered around the internet and around the world, because the risk of some of the teams/people not engaging effectively in the work is typically higher.

Very important projects justify the higher expense of RAIT. You could think of high-profile, big-dollar government technologies development programs as an example of RAIT: a government will sometimes pay two firms to developing different designs of working prototype aircraft, then choose only one of them for volume production. For a smaller-scale example, consider the notion of producing an iPhone app or Flash game for an upcoming event, where missing the date means getting no value at all for your efforts.

Thanks to David McNeil for reviewing a draft of this.

3 responses so far

Oct 30 2010

If you like it, make a link to it – a plea for real links

Published under Technology

You see something good on the web; now it’s time to tell other people about it. Maybe you’ll use various common tools:

  • Facebook “like” it
  • Social-network-share it
  • Bit.ly it
  • Tweet it
  • Mention it in a forum post
  • Mention it in a blog comment

I believe it’s smart and convenient to do those things, but not to only do those things. Why? Because they create redirected, tracked, short-lived, rel=nofollowed, or otherwise weak links. Links that don’t properly tell search engines that the content is worthwhile. Quasi-links that attempt to replace real links as the fundamental currency of the web.

If you really like it, if you think it deserves ongoing attention, then in addition to whatever else you do, put a real A-HREF link to it on your web site/blog.

One response so far

Oct 29 2010

In the Arena

Published under Business

Almost every day at some point I wander over to Hacker News, which has some great discussion, along with some less great discussion, among people pursuing or aspiring to pursue a software startup or similar business. Likewise with local events (like ITEN STL offers), and even more so the Business of Software conference earlier this month. (experiences)

I used to have a software product business myself, a vertical market SaaS firm. Now that I’ve been out of that for over a year, the thing I miss most is the feeling of being “in the arena”, of having a speculative product out there for people to buy. To be out there is both terrifying and exhilarating. I have heard it said that there are “product people” and “consulting people”, and looking back it is clear to me that I am mostly in the Product category.

Unlike some product people (like Amy Hoy, whom I admire greatly!) I don’t think it’s necessary to swear off one thing to do the other. Consulting (building software for clients) is very satisfying, especially when working with a team of great people (and a group of very competent customers) like we have at Oasis Digital.

So while I’m going to keep building software for other people, I’m also going to go back to the marketplace with speculative products. This time it will be products in the plural, some subset of:

  • Web/SaaS software
  • iPad software
  • iPhone / iPod Touch software
  • Android software (by year-end the stores will be piled high with Android tablets)
  • Or possibly HTML5/etc software to address all of the above
  • Backend / data / system management software
  • Or even, possibly, locally installed desktop software

I apologize for the vagueness of this list; but I agree with Derek Sivers about keeping one’s specific goals to oneself so my voluminous and tedious notes on exactly what products to offer, will remain offline.

Comments Off

Oct 27 2010

October 2010: Business of Software, Strange Loop, Clojure Conj

Published under Technology

I attended three conferences in October 2010, the most of any month of my life to date. Others have posted extensively about all three events, so I’ll link to a few posts and point out highlights for me.

Business of Software 2010

BoS alternates between San Franscisco and Boston; this year it was in Boston. There are plenty of excellent summaries online (here, here, here, here), and an especially nice set of photos here.

The conference was packed full of great speakers, mostly well known. I am sure the most “expensive” person in the lineup was Seth Godin; he is an excellent speaker and had interesting content, but wasn’t as relevant to me as some of the others.

The high point of BoS was Joel Spolsky’s closing talk. Unlike everyone else, he used no slides, and simply sat at a table to tell us the story of his last year or so. I was a bit surprised at his public airing of partner grievances, but that was probably necessary to tell the (very worthwhile) story of his transition over the last year from the “small, profitable company” model to the “go big” model. The former can make good money; but only the latter can make a broad impact to build a (perhaps slightly) better world.

I also especially enjoyed Erik Sink and Derek Sivers telling the stories of their company sales. My own company sale experience was more like Erik Sink’s.

In the past, Business of Software has posted the videos for year N during the marketing runup for year N+1; I suspect the same will happen this time. When those videos appear, watch them. Especially keep an eye out for Joel’s criticism of Craigslist, with which I agree.

Strange Loop 2010

Strange Loop is held in, and named after, the Delmar Loop area which spans University City and a bit of St. Louis. The 2010 event was much larger than the 2009 event; I don’t know whether it will be possile to accomodate 2011′s crowd in the Loop area or not; I’ll certainly attend either way.

Again there are plenty of summaries online, including here and here.

The highlight of this event for me was Guy Steele’s talk on parallelism. Unlike some commenters, I greatly enjoyed both the first half of the talk (a stroll through some ancient IBM assembly code) and the second half (including the Fortress example code). I’ve been inspired by this talk and criticism about it to put together my own upcoming code-centric talks, in which I’ll touch on the key parallelism ideas briefly, then step through several code examples in various languages.

I also spoke at Strange Loop, in a 20 minute slot, on Lua (video). Most of the feedback on my talk was positive, particularly of the “why, not how” approach I used to make the best use of 20 minutes. A few people would have preferred a longer talk with more “how”; I might put together such a presentation at a later date.

Disclosure: Oasis Digital sponsored Strange Loop.

(first clojure-conj)

At Clojure Conj I had the strong impression of being at the start of something big. I believe that Clojure, in spite of the needlessly-feared parentheses, has more “legs” than any other of the current crop of ascendant languages: getting state right (and thus making it possible to get parallelism right) is more important than syntax. Based on the folks I met at the Conj, I’d say Clojure has exactly the right early adopters on board.

As usual plenty of others have posted detailed notes (here, here, here, here, here).

The talk that stands out most to me was not exactly about Clojure. Rich Hickey’s keynote was about the importance and process of thinking deeply about problems to create a solution. In a sense this is the counterpoint to agile, rapid-iteration development, suitable to a different class of problems. Clojure exudes a sense of having been thought about in depth, and Rich is obviously the #1 deep thinker. When this arrives on video, watch it. Twice.

I also enjoyed Rich’s impromptu Go clinic at the pre-conference speaker (and sponsor) dinner. Note that Go has totally different rules from the similarly named Go-Moku, and is not to be confused with Google’s Go language.

Disclosure: Oasis Digital sponsored Clojure Conj.

Back to Work

I’ve had very little time for my own projects this month; between the events, most of my available hours were occupied with Oasis Digital customers. My mind is bursting with worthwhile ideas to pursue.

2 responses so far

« Newer Entries - Older Entries »