Fix timestamps after a mass file transfer

I recently transferred a few thousand files, totalling gigabytes, from one computer to another over a slowish internet connection. At the end of the transfer, I realized the process I used had lost all the original file timestamps. Rather, all the files on the destination machine had a create/modify date of when the transfer occurred. In this particular case I had uploaded files to Amazon S3 from end then downloaded them from another, but there are numerous other ways to transfer files that lose the timestamps; for example, many FTP clients do so by default.

This file transfer took many hours, so I wasn’t inclined to delete and try again with a better (timestamp-preserving) transfer process. Rather, it shouldn’t be very hard to fix them in-place.

Both machines were Windows servers; neither had a broad set of Unix tools installed. If I had those present, the most obvious solution would be a simple rsync command, which would fix the timestamps without retransferring the data. But without those tools present, and with an unrelated desire to keep these machines as “clean” as possible, plus a firewall obstacle to SSH, I looked elsewhere for a fix.

I did, however, happen to have a partial set of Unix tools (in the form of the MSYS tools that come with MSYSGIT) on the source machine. After a few minutes of puzzling, I came up with this approach:

  1. Run a command on the source machine
  2. … which looks up the timestamp of each file
  3. … and stores those in the form of batch file
  4. Then copy this batch file to the destination machine and run it.

Here is the source machine command, executed at the top of the file tree to be fixed:

find . -print0 | xargs -0 stat -t "%d-%m-%Y %T"
 -f 'nircmd.exe setfilefoldertime "%N" "%Sc" "%Sm"'
 | tr '/' '\\' >~/fix_dates.bat

I broken it up to several lines here, but it’s intended as one long command.

  • “find” gets the names of every file and directory in the file tree
  • xargs feeds these to the stat command
  • stat gets the create and modify dates of each file/directory, and formats the results in a very configurable way
  • tr converts the Unix-style “/” paths to Windows-style “\” paths.
  • The results are redirected to (stored in) a batch file.

As far as I can tell, the traditional set of Windows built in command line tools does not include a way to set a file or directory’s timestamps. I haven’t spent much time with Powershell yet, so I used the (very helpful) NIRCMD command line utilities, specifically the setfilefoldertime subcommand. The batch file generated by the above process is simply a very long list of lines like this:

nircmd.exe setfilefoldertime "path\filename" "19-01-2000 04:50:26" "19-01-2000 04:50:26"

I copied this batch file to the destination machine and executed it; it corrected the timestamps, the problem was solved.

New site: Learn Clojure

Over the last few days I put together Learn-Clojure.com, a web site to help people get started with Clojure. Please take a look, and send feedback.

I also have several other ideas for informational sites and simple applications, which I’ll launch as time allows. In the past I’ve been inclined to just post new things here on my blog, but I think certain kinds of more “evergreen” information are more useful on standalone sites. Certainly the hosting/domain economics are such that it’s not a big deal to put them there.

If you like it, make a link to it – a plea for real links

You see something good on the web; now it’s time to tell other people about it. Maybe you’ll use various common tools:

  • Facebook “like” it
  • Social-network-share it
  • Bit.ly it
  • Tweet it
  • Mention it in a forum post
  • Mention it in a blog comment

I believe it’s smart and convenient to do those things, but not to only do those things. Why? Because they create redirected, tracked, short-lived, rel=nofollowed, or otherwise weak links. Links that don’t properly tell search engines that the content is worthwhile. Quasi-links that attempt to replace real links as the fundamental currency of the web.

If you really like it, if you think it deserves ongoing attention, then in addition to whatever else you do, put a real A-HREF link to it on your web site/blog.

October 2010: Business of Software, Strange Loop, Clojure Conj

I attended three conferences in October 2010, the most of any month of my life to date. Others have posted extensively about all three events, so I’ll link to a few posts and point out highlights for me.

Business of Software 2010

BoS alternates between San Franscisco and Boston; this year it was in Boston. There are plenty of excellent summaries online (here, here, here, here), and an especially nice set of photos here.

The conference was packed full of great speakers, mostly well known. I am sure the most “expensive” person in the lineup was Seth Godin; he is an excellent speaker and had interesting content, but wasn’t as relevant to me as some of the others.

The high point of BoS was Joel Spolsky’s closing talk. Unlike everyone else, he used no slides, and simply sat at a table to tell us the story of his last year or so. I was a bit surprised at his public airing of partner grievances, but that was probably necessary to tell the (very worthwhile) story of his transition over the last year from the “small, profitable company” model to the “go big” model. The former can make good money; but only the latter can make a broad impact to build a (perhaps slightly) better world.

I also especially enjoyed Erik Sink and Derek Sivers telling the stories of their company sales. My own company sale experience was more like Erik Sink’s.

In the past, Business of Software has posted the videos for year N during the marketing runup for year N+1; I suspect the same will happen this time. When those videos appear, watch them. Especially keep an eye out for Joel’s criticism of Craigslist, with which I agree.

Strange Loop 2010

Strange Loop is held in, and named after, the Delmar Loop area which spans University City and a bit of St. Louis. The 2010 event was much larger than the 2009 event; I don’t know whether it will be possile to accomodate 2011’s crowd in the Loop area or not; I’ll certainly attend either way.

Again there are plenty of summaries online, including here and here.

The highlight of this event for me was Guy Steele’s talk on parallelism. Unlike some commenters, I greatly enjoyed both the first half of the talk (a stroll through some ancient IBM assembly code) and the second half (including the Fortress example code). I’ve been inspired by this talk and criticism about it to put together my own upcoming code-centric talks, in which I’ll touch on the key parallelism ideas briefly, then step through several code examples in various languages.

I also spoke at Strange Loop, in a 20 minute slot, on Lua (video). Most of the feedback on my talk was positive, particularly of the “why, not how” approach I used to make the best use of 20 minutes. A few people would have preferred a longer talk with more “how”; I might put together such a presentation at a later date.

Disclosure: Oasis Digital sponsored Strange Loop.

(first clojure-conj)

At Clojure Conj I had the strong impression of being at the start of something big. I believe that Clojure, in spite of the needlessly-feared parentheses, has more “legs” than any other of the current crop of ascendant languages: getting state right (and thus making it possible to get parallelism right) is more important than syntax. Based on the folks I met at the Conj, I’d say Clojure has exactly the right early adopters on board.

As usual plenty of others have posted detailed notes (here, here, here, here, here).

The talk that stands out most to me was not exactly about Clojure. Rich Hickey’s keynote was about the importance and process of thinking deeply about problems to create a solution. In a sense this is the counterpoint to agile, rapid-iteration development, suitable to a different class of problems. Clojure exudes a sense of having been thought about in depth, and Rich is obviously the #1 deep thinker. When this arrives on video, watch it. Twice.

I also enjoyed Rich’s impromptu Go clinic at the pre-conference speaker (and sponsor) dinner. Note that Go has totally different rules from the similarly named Go-Moku, and is not to be confused with Google’s Go language.

Disclosure: Oasis Digital sponsored Clojure Conj.

Back to Work

I’ve had very little time for my own projects this month; between the events, most of my available hours were occupied with Oasis Digital customers. My mind is bursting with worthwhile ideas to pursue.

Map-Reduce in the Small: an Array of Talks

At Strange Loop 2010, Guy Steele gave a wide-ranging, excellent talk in which the key point was:

In essence, his notion is to use a divide-and-conquer approach, which he described as “map-reduce in the small” (or some similar phrase). This is analogous to techniques used to partition work in large distributed systems, but inside a single program.

I heartily agree with all of this. Massive multicore will be a dominant factor in software design in the coming decade. In 2010, most of us are happily waiting in a calm before a storm, because our multicore machines don’t have very many cores yet. For most applications, we get by with very coarse parallelism (such as one thread per concurrent user request being served, in an application server or web server). This won’t last – when cheap PCs have 50+ cores, most software will need to harness parallelism in a much more fine-grained way. Allocating only one core per concurrent operation will become ridiculous.

You can download Steele’s slides from the Strange Loop site, or watch this video of his previous talk at ICFP 2009 in which he covered some of the same material. At Strange Loop, Steele showed how to solve a particular problem (counting words in a string) in a manner amenable to parallel processing. His sample code was written in Fortress, a Sun-Oracle research language. Fortress didn’t bother me, but I heard some discussions about the language choice (and rapid presentation) as an obstacle to detailed understanding.

I propose to elaborate these ideas by walking through sample code, in a series of talks.

Talk Proposal: Map-Reduce in the Small

First, I will briefly summarize the need for fine-grained parallelism.

Then, I will present three code walkthrough examples in a widely used language.

  1. The word-count problem from Guy Steele’s Stange Loop talk.
  2. Another simple algorithmic / computer-science-flavored example.
  3. As time allows, a third example from a thoroughly practical, enterprise-app-flavored problem space.

This talk will use very few slides; instead it will be all about the code. Each example will show how (not just why) to parallelize algorithms.

An Array[0..N] of Talks

This idea is worthy of deep understanding and practice, so I have in mind several talks, each using a different language, and with sufficient differences to be interesting even for someone who happens to be present for all of them:

  1. Examples in Java, at the St. Louis Java user group
  2. Examples in Clojure, at the St. Louis Clojure Lunch Cljub (there are already some good Clojure examples out there, making this one easy, but still worthwhile).
  3. Examples in another language (perhaps more esoteric, to be determined later), at Lambda Lounge
  4. A repeat of one of those languages, with updated examples, at Strange Loop 2011 (no web site yet!)

Putting these together will take a while, so I have in mind spreading these over the next year or so. Of course, it is quite possible that only a subset of these groups/events, possibly an empty set, will accept this talk offer. In that case, I’ll take it as a sign that the St. Louis community already understands micro-parallelism in depth, and celebrate!

Update: Schedule

As these talks are scheduled, I will put the information here:

  1. Java code at the St. Louis JUG: Jan 13, 2011.

Refactoring some Factor code

Most of the software I work with is very practical. At Oasis Digital we mostly create line-of-business enterprise software, and even when I step away from that, I usually pick up a tool or language that has a good likelihood of mainsteam adoption.

Sometimes, though, I like to really stretch my mind. For that, it’s hard to beat Factor. Factor is fascinating in that it combines a goal of efficiency and practicality, with a syntax and computation model which are quite alien even to a software polyglot. Don’t let the stack-ness deceive you; it’s a big leap even if you’ve used FORTH and grown up with a HP RPN programmable calculator.

So I set about this evening to work with some Factor code, a simple GUI calculator posted a few days ago by John Benediktsson. I bit off an apparently small bit of work: remove the “code smell” of that global variable, and in the process, make it so multiple calcs each have their own model (rather than a global shared state).

Original version from John

My finished version

The two most important pieces of updated code:

The changes consist approximately of:

  • Change all the button words to accept a model input
  • Change the <row> word to accept a model and use map instead of output>array
  • Remove the calc variable
  • Change the calc-ui word to shuffle things around and use make rather than output>array

In case it isn’t obvious from my text above or the source code, I am not a Factor programmer, please do not use this as example code. On the other, I learned a bunch of little things about Factor, and perhaps implicitly about concatenative programming in general, in the process of making this work.