Reposurgeon, for high fidelity source control system migration

The best time to blog about something is when it happens. The second best time is when you remember years later that you should have blogged about it. That’s now.

I’ve worked on complex source control system migrations, moving between various systems, most commonly SVN to Git. There are hundreds or thousands of tools and scripts around the Internet suggested for every plausible migration pair; almost all of which don’t even attempt to solve the whole problem.

The closest I seem to solving the whole problem is reposurgeon:

reposurgeon’s web page

reposurgeon source code on Gitlab

The strength of this tool is that it is intended to be scripted. Rather than doing a single-shot conversion, the workflow is:

  1. Attempt the conversion
  2. Study the results
  3. Tweak the conversion script (which can perform extensive and complex changes to the source code history on its way through)
  4. Repeat until approximately perfect

Teams using an old system continue doing so while the migration is worked on. Only once the migration has been perfected, is it time to cut over.

By scripting, I don’t mean that you, the user, must write scripts to do the basics of source control system history migration; that is the job of the tool. Rather, script to patch up the ugly bits of history in an old system during the translation. For example, in a moment of desperation, did somebody once merge a giant change to the mainline, something like rolling back the last three months of development, to try to get a deployable old build? That’s an easy bit of the old history to leave out during a reposurgeon-powered migration.

During migration you can translate usernames, branch names, details buried inside commit messages, and any other aspect you might wish to clean up programmatically. Think of it as a multi-source code system compatible analog of git-rewrite-branch, except for an entire repository, not a branch.

One major downside, as of the last time I used reposurgeon: it operates in memory, so you’ll need enough RAM for the whole source code history. This can typically be accommodated, even on quite large code bases, by temporarily allocating an extremely large compute instance on your cloud provider.

A Better Bash+Git Prompt

I enjoy a souped-up Bash prompt which radiates information about (for example) the branch I am on, in addition to the usual information (current directory). There are countless examples online, but the nicest I’ve found so far is this one from Martin Gondermann. I have it set to show just a little information:

bash-prompt-with-git-info

Martin shows a more sophisticated example, along with an explanation of the Git-related decorations, as shown:

bash-prompt-explanation

Whether you prefer this one or any of the other similar tools, it’s worth the few minutes to upgrade your prompt from “$”.

What is the Best Git GUI (Client) for Windows?

I adopted Git as my primary source control tool a couple of years ago, when I was using Windows as my primary (90%) desktop OS. Since then I’ve switched to 75% Mac OSX, but I still use Git on Windows for a few projects, and I get a lot of questions about Git on Windows.

I use msysgit (and its included GUI) most often myself, but I don’t have a clear answer as to which is the “best” Git GUI for Windows. I can offer this list of choices, though, along with some thoughts about them.

There is also a very long list of Git tools on the main Git wiki; but that page is just a list, without any other information.

msysgit

msysgit is the main project which ships a Windows port of Git. It is based on MSYS, so it fits in the Windows ecosystem a bit better than the cygwin Git port.

msysgit includes the same Tk-based GUI tools as Git on Linux: a commit tool and a repo-browse tool, plus a bit of shell integration to active the GUI by right-clicking in Windows Explorer, plus a new thing call git-cheetah, which appears to be heading toward Tortoise-style integration. These tools are a bit ugly, but have good and useful functionality. I don’t mind the ugly (I get my fix of stylish software over on my Mac…), and I find the features ample for most work.

If you don’t know where to start, or if you want a Linux-like Git experience, start with msysgit and learn to use its tools.

msysgit is free open source software. It is under active development, and keeps up with the upstream Git versions reasonably well. There is even a portable (zero-install) version available.

My biggest gripe with msysgit (and its GUI) is that I had to figure out how to use it effectively myself. I could have really used a video walkthrough of how to be productive with it, back when I was starting out. That was a long time ago for me, but might be Right Now for people reading this post. Mike Rowe (a reader) helpfully suggested this msysgit tour, which is very helpful though a bit dated.

TortoiseGit

This is an attempt to port TortoiseSVN to git, yielding TortoiseGit. If you like and use TortoiseSVN, you’ll probably find this worth a try. I haven’t tried it yet myself.

TortoiseGit is free open source software, and is under active development.

Git Extensions

This Git GUI has a shell extension (like the Tortoise family) and also a plugin for Visual Studio. From the screen shots, it appears to be feature-rich and complete.

Git Extensions is free open source software, and is under active development.

SmartGit

Unlike the other tools listed here, SmartGit is a commercial product (from a German company), starting at around $70. It appears to be more polished than the others, as is often the case with commercial products. It also appears to be quite feature-rich.

I don’t know how SmartGit fits in with the Git licensing; Git is licensed GPL (v2), so I assume (hope?) SmartGit has found some way to use it under the hood without linking to it in a way that would cause license trouble.

SmartGit requires a Java runtime, implying that it is written in Java. Five year ago I thought of that as a caveat; but today, Java-based GUIs can be extremely attractive and fast, so I don’t see as a problem at all.

Is IDE Integration Vital?

I know people who swear by their IDE experience, and are aghast at the thought of any daily-use dev tool that is not integrated with their IDE. It is almost as though for this group, multitasking does not exist, and any need to run more than one piece of software at the same time is a defect.

Now I love a good IDE as much as anyone (I’ve urged and coached many developers to move from an editor to an IDE), but I don’t agree with the notion that source control must always be in the IDE. IDE-integrated source control can be very useful, but there are sometimes cases where non-integrated source control wins.

The most common example for me is when using Eclipse on a large, complex system. There are two annoyances I see regularly:

  • Eclipse assumes that one Eclipse project is one source control project, an assumption that is sometimes helpful and sometimes painful. In the latter case, simply ditch the Eclipse integration, and use a whole workspace (N projects) as a single source-control project, outside of Eclipse.
  • Sometimes Eclipse source control integration bogs down performance. Turn it off, and things speed up.

Therefore, when I use Eclipse, I sometimes manage the files from outside, using msysgit, command line, etc. When I have a complex “real-life project” comprised of many Eclipse “projects”, I set up a separate Eclipse workspace for it, apart from other unrelated Eclipse projects.

Feedback Wanted

I’d love to hear about:

  • More Windows Git GUIs to list here
  • Anything else I’ve missed

.. via the contact page (link at the top of the page). I try to reply to all email within a few days.

Rearranging files in SVN? Use git-svn instead.

From time to time I need to rearrange a set of files in a project, typically while revamping the file / directory layout of a set of source files. The most direct way to do so is by dragging the files around (using a Windows or Linux GUI), but this is quite tedious to do if the files are in a Subversion repo, because SVN does not understand the file moves and renames unless you use “svn mv” or (with TortoiseSVN) right-click-drag the files to new locations.

The answer to all this is simple: use git-svn for your SVN checkout, instead of the SVN command lines tools or Tortoise. You can readily Google for extensive git-svn information to get started.

The workflow with git-svn for file rearrangement is fairly simple, and very fast:

  • “git svn rebase” to get your local git repo up to date with your SVN repo.
  • make new directories, rearrange files, rename files, etc., using any tool you like. This is easy and painless because git (wisely) keeps all the metadata in a top-level .git directory. It is fast because you aren’t running any svn code (or any git code) while doing so.
  • use “git add”, or the git GUI, to add the newly moved / renamed files.
  • “git commit”
  • “git svn dcommit” to push the changes over to SVN.

git / git-svn looks at the content of files, notices (rather than you needing to tell it) that the content has been moved to different filenames in different directories, and generates SVN move/copy operations. It even notices nearly-identical (edited) files and handles them correctly.

One loose end is left hanging, which is that git-svn does not remove any now-empty or removed directories from SVN, a consequence of git not caring about directories. Therefore, if you need the empty directories removed from the SVN repo, use a normal SVN client (command line, Tortoise) to do so.

From the description of above, this does not initially sound like a big win; but with hundred of files to move around, the time for the git commands (instant, except for the first and last) is irrelevant, and the time/hassle saved is enormous.

Perhaps someday SVN itself will gain the ability to “just work” when you move or rename files / directories, rather than having to be told.

Getting Started with Git and GitHub on Windows

(Update: I have a new, related post about the Best Git GUIs for Windows.)

I’ve been attracted to, and trying out, various distributed source control tools for the last two years, and have come to the conclusion that the most likely “winner” is Git. Git does a great many things right, good progress is being made in the few areas it is weak, and it has rapidly growing popularity. There are many web sites with extensive information about using Git, learning Git, Git integration, and more.

For new Oasis Digital projects, we will generally Git rather than SVN for source control. Here are instructions I wrote to help our teams get started. The contents here are 95% generic, but the references to me are, of course, Oasis-Digital-specific.

(For a general introduction to Git, consider this video at GitHub.)

GitHub

Although Git is a fully distributed source control system, it is very convenient to have a set of robust, central repositories. Oasis Digital’s repositories are hosted by GitHub:

http://github.com/

Github offers a useful set of online features to supplement what Git has built in and available locally. As of the spring of 2008, GitHub is certainly a work-in-progress, I’d characterize is as a “beta” level service. Nonetheless it is worthwhile and recommended. There is a lot to learn from the “guides” published here also:

http://github.com/guides

Install and Configure msysGit on Windows

I assume here that you are using Windows, although Git works very well (better, actually) on Linux or Mac. As I write this, the best Windows Git package is msysgit, available here:

http://code.google.com/p/msysgit/

Make sure to follow the download instructions labeled “If you only want to use Git”. As I write this the download is Git-1.5.5-preview20080413.exe, but get the current version available as you read this instead, not that specific version.

Install by running the EXE installer. Accept the default install directory. When you get to the PATH setting screen, I recommend the “Use Git Bash only” setting, because it avoid any risk of PATH conflicts.

By default, Git will be configured to translate text files between Windows CRLFs (in your working copy) and Unix LFs (in repositories). This setting is fine if you like to use an editor on Windows that insists on Windows CRLFs. I generally use an editor that is equally happy to use Unix LFs, so I sometimes use Git in the other (non-translating) mode.

msysgit includes both the git command line, and a usable GUI. The GUI is not on par with more mature products, but it is helpful and good enough for users who are allergic to the command line.

Create your SSH Key

The first step in using Git is to create your SSH Key. This will be used to secure communications between your machine and other machines, and to identify your source code changes. (If you already have an SSH key for other reasons, you can use it here, there is nothing Git-specific about this.)

In Windows Explorer, pick any convenient directory, right-click, and choose “Git Bash Here”.

Then type this command:

ssh-keygen -C "username@email.com" -t rsa

(with your own email address, of course)

Accept the default key file location. When prompted for a passphrase, make one up and enter it. If you feel confident that your own machine is secure, you can use a blank passphrase, for more convenience and less security. Note where it told you it stored the file. On the machine I tested with, it was stored in “C:\Documents and Settings\Kyle\.ssh\”

Open the file id_rsa.pub with a text editor. The text in there is your “public SSH key”. You will need it to set up your GitHub account, in the next section.

Beware of $HOME trouble: a reader reported a tricky failure mode in which some other software he installed had set up a HOME or HOME_PATH environment variable pointing in to that application instead of to your real home (“Documents and Settings”) directory.

More details on the key process are available here:

http://github.com/guides/providing-your-ssh-key

Set up your GitHub account

Go to http://github.com/ and sign up for a free account. In the sample here I signed up with a made-up alter-ego, harry@kylecordes.com. Use your own, real email address of course.

Don’t worry that GitHub describes the free account as for “open source” work; I will later add you as a collaborator to my paid (and therefore private, non-open-source) projects. Make sure to copy-paste in your SSH public key that you created earlier.

Once you have set up your github account, email your github username (not your password) to me, so I can add you to the relevant project(s). (Reminder – send to me only if you are working on an Oasis Digital project! If you are using this page as generic Git instructions, send the information to your project leader instead!)

Once I have added you as a collaborator to the relevant project and sent back its URL, you can navigate to the URL, which will look like this:

https://github.com/kylecordes/PROJECTNAME/tree/master

On that page, click the “fork” button to create your own workspace for the project. This will take you to your own page for the project, something like this:

https://github.com/YOURNAME/PROJECTNAME/tree/master

Now you can clone this project to your own machine, as discussed below.

Getting Started Locally

First, use the “Git Bash Here” feature described above, to get a command prompt. Tell Git about your name and email address:


git config --global user.email Your.Email@domain.com
git config --global user.name "Your Real Name"

Then you are ready to proceed with getting into a project. Copy the “Clone URL” from a github project page. Make a new directory on your machine, to become your working directory. There are two approaches to which project to clone.

  1. Clone from your own fork repo. This will make it trivial to push your changes up, but require one more command to get upstream changes.
  2. Clone from the upstream (my) repo. This will make it trivial to get change, but require one more command to be able to push changes, because you can’t push to another Github users’ repo.

I assume later in these instructions that you chose #1.

Next, get your local clone by clicking, or by typing.

Approach 1: GUI

In Windows Explorer, Right click the working directory and choose “Git GUI Here”.

Choose “Clone Existing Repository” in the dialog that comes up:

Paste the URL that you copied above from GitHub. Note that in the screenshot I show it as if you clone your repo, while I think it’s slightly easier overall to clone the upstream repo. Thus it really should show git@githib.com:kylecordes/sample1.git instead.
Note that your browser might add an erroneous mailto: to the URL, which you must remove – Git URLs do not start with “mailto”. Enter the directory where you want your working copy:

Make sure to use real, reasonable values. For example, you are not working “sample1” and you probably don’t keep your working projects in a directory named “GitStuff”, so put in a directory that makes sense for the project you are working on. Also, put your working copy in a place where you can effectively work in it; for example the working copy for a web project should usually be under the webroot of your local development web server.

Click Clone. You will be prompted for your passphrase if you used one. In a few minutes the Clone will finish, and you have the project available on your machine. I’ve had sporadic problems with this process hanging (growing pains at GitHub are the likely cause), so if you don’t see progress for a few minutes, stop it and start over.

Approach 2: Command Line

I find this easier. In Windows Explorer, right-click on the working directory you want and choose “Gui Bash Here”. Then enter a command like this:

git clone repoURL

Git might prompt you about an SSH key, the first time you do this with github (or any other new server). Answer “yes”.

It’s worth pointing out here, if you didn’t already understand from the various Git web sites, that Git is a distributed source control system. It will pull down the whole project history, so you can browse history and even commit changes without online access. Thus Git works very well if you have an intermittent or poor network connection.

Work Flow

As with all source control, work in the directory where you use source control. Do not copy files back and forth between here and some other working directory, that is a path to endless merge and update problems.

Once you have checked out the software, here is a summary of your work flow. For more details, please read the copies Git documentation online. I suggest reading both the official Git material, as well as other sites and articles about Git.

Getting Changes

Get changes from others with “git pull” (or using the GUI). By default this will pull from the repo from which you cloned, so if you cloned the upstream repo, that will get other peoples’ changes.

If you cloned from your own Github repo, you’ll need to use something like this:


git remote add upstream git@github.com:kylecordes/sample1.git
git pull kyle upstream

Sending Changes

Commit your changes locally with “git commit” (or using the GUI). Remember that Git generally wants you to explicitly say which files’ changes to include (“git add”), so make sure you read and understand enough about Git to do this properly; it is only a few commands or clicks in the GUI. The usual caveat applies, to only commit actual source files, not generated files or temp files.

Push your changes up to your GitHub repository with “git push”. This step will make it so others on your project can see your changes. Do this at least once per day, and ideally more often as you collaborate. Assuming that you cloned from the upstream repo, you’ll need to set up a reference to your own Github repo (the one you can push to), with something like this:


git remote add harry git@github.com:harrycordes/odtimetracker.git

As usual, use reasonable names and relevant URLs, not my sample names and URLs. Once you’ve added the remote reference, pushing is easy:


git push harry master

When you have a set of changes (one or more commits) that you think are ready to go in to the main-line of the project, use Github to issue a “pull request”. Your project lead (me, typically, at Oasis Digital) will review your commits and either pull them in to the main-line, or send feedback about changes needed before they can be pulled in.

A key thing to understand about Git is that it makes branching extremely easy and fast, so that very convenient to use branches. If you are accustomed to other source control systems where branching is a big, painful thing, it will be very different for you in Git. Once you learn to use branches, you’ll sometimes push up a branch you are working on instead of master.

I’ve only scratched the surface in this introduction. You now have Git up and running with project code in it, so pick up a Git tutorial or reference (such as the screencast videos at GitCasts) and start learning.

To Learn More

I heartily recommend the “Illustrated Guide to Git on Windows“. It doesn’t yet cover GitHub, but does cover many more details of using Git itself.

Also, a bit of Git can be very useful even in a project that uses SVN, especially when you need to rearrange a bunch of files in SVN.

Update: In a newer post, I list several other Git GUIs for Windows.

Git on Windows, it actually works now

I’ve been trying out various distributed source control tools, and used several of them for various very small projects. I’ve most mostly settled on git as the one I prefer, but I haven’t yet published any code with it. Also, I’ve been frustrated that git support for Windows has been very weak.

Msysgit appears to have solved the git-windows problem, at least well enough for small scale work. If you’ve been holding back on trying git because you use Windows, now is the time to jump in.

Update: I’ve posted details on how to get started with msysgit and GitHub as well as a comparison of Git software for Windows.