Fix timestamps after a mass file transfer

I recently transferred a few thousand files, totalling gigabytes, from one computer to another over a slowish internet connection. At the end of the transfer, I realized the process I used had lost all the original file timestamps. Rather, all the files on the destination machine had a create/modify date of when the transfer occurred. In this particular case I had uploaded files to Amazon S3 from end then downloaded them from another, but there are numerous other ways to transfer files that lose the timestamps; for example, many FTP clients do so by default.

This file transfer took many hours, so I wasn’t inclined to delete and try again with a better (timestamp-preserving) transfer process. Rather, it shouldn’t be very hard to fix them in-place.

Both machines were Windows servers; neither had a broad set of Unix tools installed. If I had those present, the most obvious solution would be a simple rsync command, which would fix the timestamps without retransferring the data. But without those tools present, and with an unrelated desire to keep these machines as “clean” as possible, plus a firewall obstacle to SSH, I looked elsewhere for a fix.

I did, however, happen to have a partial set of Unix tools (in the form of the MSYS tools that come with MSYSGIT) on the source machine. After a few minutes of puzzling, I came up with this approach:

  1. Run a command on the source machine
  2. … which looks up the timestamp of each file
  3. … and stores those in the form of batch file
  4. Then copy this batch file to the destination machine and run it.

Here is the source machine command, executed at the top of the file tree to be fixed:

I broken it up to several lines here, but it’s intended as one long command.

  • “find” gets the names of every file and directory in the file tree
  • xargs feeds these to the stat command
  • stat gets the create and modify dates of each file/directory, and formats the results in a very configurable way
  • tr converts the Unix-style “/” paths to Windows-style “\” paths.
  • The results are redirected to (stored in) a batch file.

As far as I can tell, the traditional set of Windows built in command line tools does not include a way to set a file or directory’s timestamps. I haven’t spent much time with Powershell yet, so I used the (very helpful) NIRCMD command line utilities, specifically the setfilefoldertime subcommand. The batch file generated by the above process is simply a very long list of lines like this:

I copied this batch file to the destination machine and executed it; it corrected the timestamps, the problem was solved.

What is the Best Git GUI (Client) for Windows?

I adopted Git as my primary source control tool a couple of years ago, when I was using Windows as my primary (90%) desktop OS. Since then I’ve switched to 75% Mac OSX, but I still use Git on Windows for a few projects, and I get a lot of questions about Git on Windows.

I use msysgit (and its included GUI) most often myself, but I don’t have a clear answer as to which is the “best” Git GUI for Windows. I can offer this list of choices, though, along with some thoughts about them.

There is also a very long list of Git tools on the main Git wiki; but that page is just a list, without any other information.

msysgit

msysgit is the main project which ships a Windows port of Git. It is based on MSYS, so it fits in the Windows ecosystem a bit better than the cygwin Git port.

msysgit includes the same Tk-based GUI tools as Git on Linux: a commit tool and a repo-browse tool, plus a bit of shell integration to active the GUI by right-clicking in Windows Explorer, plus a new thing call git-cheetah, which appears to be heading toward Tortoise-style integration. These tools are a bit ugly, but have good and useful functionality. I don’t mind the ugly (I get my fix of stylish software over on my Mac…), and I find the features ample for most work.

If you don’t know where to start, or if you want a Linux-like Git experience, start with msysgit and learn to use its tools.

msysgit is free open source software. It is under active development, and keeps up with the upstream Git versions reasonably well. There is even a portable (zero-install) version available.

My biggest gripe with msysgit (and its GUI) is that I had to figure out how to use it effectively myself. I could have really used a video walkthrough of how to be productive with it, back when I was starting out. That was a long time ago for me, but might be Right Now for people reading this post. Mike Rowe (a reader) helpfully suggested this msysgit tour, which is very helpful though a bit dated.

TortoiseGit

This is an attempt to port TortoiseSVN to git, yielding TortoiseGit. If you like and use TortoiseSVN, you’ll probably find this worth a try. I haven’t tried it yet myself.

TortoiseGit is free open source software, and is under active development.

Git Extensions

This Git GUI has a shell extension (like the Tortoise family) and also a plugin for Visual Studio. From the screen shots, it appears to be feature-rich and complete.

Git Extensions is free open source software, and is under active development.

SmartGit

Unlike the other tools listed here, SmartGit is a commercial product (from a German company), starting at around $70. It appears to be more polished than the others, as is often the case with commercial products. It also appears to be quite feature-rich.

I don’t know how SmartGit fits in with the Git licensing; Git is licensed GPL (v2), so I assume (hope?) SmartGit has found some way to use it under the hood without linking to it in a way that would cause license trouble.

SmartGit requires a Java runtime, implying that it is written in Java. Five year ago I thought of that as a caveat; but today, Java-based GUIs can be extremely attractive and fast, so I don’t see as a problem at all.

Is IDE Integration Vital?

I know people who swear by their IDE experience, and are aghast at the thought of any daily-use dev tool that is not integrated with their IDE. It is almost as though for this group, multitasking does not exist, and any need to run more than one piece of software at the same time is a defect.

Now I love a good IDE as much as anyone (I’ve urged and coached many developers to move from an editor to an IDE), but I don’t agree with the notion that source control must always be in the IDE. IDE-integrated source control can be very useful, but there are sometimes cases where non-integrated source control wins.

The most common example for me is when using Eclipse on a large, complex system. There are two annoyances I see regularly:

  • Eclipse assumes that one Eclipse project is one source control project, an assumption that is sometimes helpful and sometimes painful. In the latter case, simply ditch the Eclipse integration, and use a whole workspace (N projects) as a single source-control project, outside of Eclipse.
  • Sometimes Eclipse source control integration bogs down performance. Turn it off, and things speed up.

Therefore, when I use Eclipse, I sometimes manage the files from outside, using msysgit, command line, etc. When I have a complex “real-life project” comprised of many Eclipse “projects”, I set up a separate Eclipse workspace for it, apart from other unrelated Eclipse projects.

Feedback Wanted

I’d love to hear about:

  • More Windows Git GUIs to list here
  • Anything else I’ve missed

.. via the contact page (link at the top of the page). I try to reply to all email within a few days.

Massive Parallelism and Microslices

I just read James Hamilton’s comments on “Microslice” servers, which are very low-power, but high CPU-to-wattage ratio servers. As he explains in detail, at scale the economics of this design are compelling. In some ways, of course, this is the opposite of another big trend going on, which is consolidation through virtualization. I reconcile these forces like so:

  1. For enterprises with a high ratio of emloyees-per-server-CPU, the cost factors tend to drive cost as a function of the number of boxes / racks /etc. This makes virtualization on to a few big servers a win.
  2. But for enterprises with a low ratio (lots of computing work, small team), the pure economics of the microserver approach makes it the winner.

The microserver approach demands:

  • better automated system adminstration, you must get to essentially zero marginal sysadmin hours per box.
  • better decompisition of the computing work in to parallelizable chunks
  • very low software cost per server (you’re going to run a lot of them), favoring zero-incremental-cost operating systems (Linux)

My advice to companies who make software to harness a cloud of tiny machines: find a way to price it so your customer pays you a similar amount to divide their work among 1000 microservers, as they would amount 250 heavier servers; otherwise if they move to microservers they may find a reason to leave you behind.

On a personal note, I find this broadening trend toward parallelization to be a very good thing – because my firm offers services to help companies through these challenges!

I Went In a Boy, I Came Out a Man

Apple Store large logo sign

Not really, it just seemed like the sort of over-the-top thing a rabid Mac fan might say.

But I did replace my main Windows PC with a MacBook Pro. I’ve used Apple products occasionally over the decades, going all the way back to the Apple II, IIe, IIgs, and orignal 1984 Macintosh. I’m not “switching”, but rather adding; our client projects at Oasis Digital continue to run primarily on Windows or Linux. Our Java work runs with little extra effort on all three platforms.

Here are some thoughts from my first days on this machine and OSX:

  • The MacBook Pro case is very nice. I didn’t see any Windows-equipped hardware with anything similar. The high-tech metal construction is an expensive (and thus meaningful) signal that Apple sends: Apple equipment is high end. The case also has the great practical benefit of acting as a very large heat sink.
  • The MPB keyboard is a bit disappointing; I miss a real Delete key (in addition to Backspace), Home, End, PageUp, PageDown. At my desk I continue to use a Microsoft Natural Keyboard, so this is only a nuisance on the road.
  • I bought a Magic Mouse for the full Apple experience; but I’ll stick with a more normal mouse (and its clickable middle wheel-button) for most use. I find wireless mice too heavy, because of their batteries.
  • Apple’s offerings comprise a fairly complete solution for common end user computing needs; for example, Apple computers, running Time Machine for backup, storing on a Time Capsule. I didn’t go this route, but it is great to see it offered.
  • Printing is very easy to set up, particularly compared to other Unix variants.
  • VMWare Fusion is fantastic, and amply sufficient to use this machine for my Windows work. Oddly, my old Windows software running inside seems slightly more responsive than the native Mac GUI outside (!).
  • I need something like UltraMon; the built in multi-monitor support is trivial to get working, but the user experience is not as seamless as Windows+UltraMon. For example, where is my hotkey to move windows between screens, resizing automatically to account for their different sizes?
  • Windows has a notion of Cut and Paste of files in Explorer. It is conceptually a bit ugly (the files stay there when you Cut them, until Pasted), but extremely convenient. OSX Finder doesn’t do this, as discussed at length on many web pages.
  • I would like to configure the Apple Remote to launch iTunes instead of Front Row, but haven’t found a way to do so yet. No, Mr. Jobs, I do not wish to use my multi-thousand-dollar computer in a dedicated mode as an overgrown iPod. Ever.
  • The 85W MagSafe power adapter, while stylish and effective, is heavy. I’d much prefer a lighter aftermarket one, even if it was inferior in a dozen ways, but apparently Apple’s patent on the connector prevent this. I’d actually be happy to pay Apple an extra $50 for a lightweight power adapter, if they made such a thing.
  • This MBP is much larger, heavier, and more expensive than the tiny Toshiba notebook PC it replaces; yet it is not necessarily any better for web browsing, by far the most common end user computer activity in 2009. This is not a commentary on Apple, it merely points out why low-spec, small, cheap netbooks are so enormously popular.

Java Documentation in a Windows HTMLHelp (CHM) file

The Java software development kit documentation can be downloaded from java.sun.com, in the form of a ZIP file with tens of thousands of linked HTML file. This seems like an ideal canonical form for this documentation, but it is inconvenient to use, and inconvenient to have so many files sitting around.

If your development machine runs Windows, a Windows HTML Help is more convenient; it consists of a handful of files (mostly one large CHM file) and offers good search and browse features. It can be unpacked or moved around far more quickly. Franck Allimant processes the HTML from Sun for each new Java release, and makes the CHMs etc. available on his web site. Recommended.

Allimant has been doing this for quite a few years now; I suspect it was unauthorized at first, but some kind of peace was made with Sun, which now links to the site.

Fresh Windows Install, Ouch

A few hours ago I started with a fresh Windows XP Home install for a computer for my family.

I’m still going.  I have lost count of how many times I have had to reboot the machine then get a new round of updates to install.  It is ludicrous.

Modern Linux distributions are far superior to this.  I recently installed Ubuntu on a several machines; the installer downloaded updates during the install process, so the machine was fully updated, on the first boot.