Comparing OPML Files, or How to Leave NetNewsWire

Recently I reached a level of excessive frustration with NetNewsWire (Mac) and decided it was time to move on. Problems with NetNewsWire include:

  1. NetNewsWire has no way to sync its subscription list to match your Google Reader subscription list. There is a Merge button in the Preferences that sounds like it should do this, but it does not work correctly. Once your lists get out of sync, they generally stay that way.
  2. NetNewsWire won’t prefetch images referenced in feeds. Without this, it is not useful for the most obvious purpose of a desktop reader: reading without a network connection. That’s a reasonable thing to leave out in early development, but in a mature product? What could they have been thinking?
  3. NetNewsWire fails (silently) to subscribe to Google Alerts feeds, apparently because Google Reader already knows about those feeds… but see #1.
  4. As many other users have reports, NetNewsWire frequently shows a different number of unread items from Google Reader, and no amount of Refreshing makes it match. The sync doesn’t quite work.

But to get rid of NetNewsWire, I needed to verify that I had all my feeds in Google Reader. This was easy:

  1. Export OPML feed list from NNW
  2. Export OPML feed list from Reader
  3. Use a bit of perl regex and diff (below) to extract and compare just the list of feed URLs
  4. Look over the diff, and copy-paste-subscribe the missing ones in Reader

The commands are:

perl -ne '/xmlUrl="([^"]*)"/ && print "$1\n"' <google-reader-subscriptions.xml  | sort >gr.urls
perl -ne '/xmlUrl="([^"]*)"/ && print "$1\n"' <nn.opml  | sort >nn.urls
diff gr.urls nn.urls

… which took much less time and far fewer keypresses than writing this post.

Offline reading is still very useful; at the moment I’m trying a combination of Google Reader, Gruml, and Reeder (iPad). Those work very well – so well that the risk of time-wasting feeds must be managed agressively: drop all but the most important, and don’t look every day.

Fix timestamps after a mass file transfer

I recently transferred a few thousand files, totalling gigabytes, from one computer to another over a slowish internet connection. At the end of the transfer, I realized the process I used had lost all the original file timestamps. Rather, all the files on the destination machine had a create/modify date of when the transfer occurred. In this particular case I had uploaded files to Amazon S3 from end then downloaded them from another, but there are numerous other ways to transfer files that lose the timestamps; for example, many FTP clients do so by default.

This file transfer took many hours, so I wasn’t inclined to delete and try again with a better (timestamp-preserving) transfer process. Rather, it shouldn’t be very hard to fix them in-place.

Both machines were Windows servers; neither had a broad set of Unix tools installed. If I had those present, the most obvious solution would be a simple rsync command, which would fix the timestamps without retransferring the data. But without those tools present, and with an unrelated desire to keep these machines as “clean” as possible, plus a firewall obstacle to SSH, I looked elsewhere for a fix.

I did, however, happen to have a partial set of Unix tools (in the form of the MSYS tools that come with MSYSGIT) on the source machine. After a few minutes of puzzling, I came up with this approach:

  1. Run a command on the source machine
  2. … which looks up the timestamp of each file
  3. … and stores those in the form of batch file
  4. Then copy this batch file to the destination machine and run it.

Here is the source machine command, executed at the top of the file tree to be fixed:

find . -print0 | xargs -0 stat -t "%d-%m-%Y %T"
 -f 'nircmd.exe setfilefoldertime "%N" "%Sc" "%Sm"'
 | tr '/' '\\' >~/fix_dates.bat

I broken it up to several lines here, but it’s intended as one long command.

  • “find” gets the names of every file and directory in the file tree
  • xargs feeds these to the stat command
  • stat gets the create and modify dates of each file/directory, and formats the results in a very configurable way
  • tr converts the Unix-style “/” paths to Windows-style “\” paths.
  • The results are redirected to (stored in) a batch file.

As far as I can tell, the traditional set of Windows built in command line tools does not include a way to set a file or directory’s timestamps. I haven’t spent much time with Powershell yet, so I used the (very helpful) NIRCMD command line utilities, specifically the setfilefoldertime subcommand. The batch file generated by the above process is simply a very long list of lines like this:

nircmd.exe setfilefoldertime "path\filename" "19-01-2000 04:50:26" "19-01-2000 04:50:26"

I copied this batch file to the destination machine and executed it; it corrected the timestamps, the problem was solved.