Git Talk at the STL JUG

Yesterday (Oct. 9, 2008), I gave a talk on Git at the St. Louis Java User Group. Rather than a typical “intro” talk, instead I showed a dozen or so common usage scenarios, then answered questions with additional ad-hoc demos. As with some other recent talks, I eschewed PowerPoint in favor of a printed handout. The text of the handout follows below, or you can download the PDF. I also recorded the audio of the talk, but without video, so the general discussion portions are worthwhile but the demo portions are not.
Continue reading “Git Talk at the STL JUG”

Rearranging files in SVN? Use git-svn instead.

From time to time I need to rearrange a set of files in a project, typically while revamping the file / directory layout of a set of source files. The most direct way to do so is by dragging the files around (using a Windows or Linux GUI), but this is quite tedious to do if the files are in a Subversion repo, because SVN does not understand the file moves and renames unless you use “svn mv” or (with TortoiseSVN) right-click-drag the files to new locations.

The answer to all this is simple: use git-svn for your SVN checkout, instead of the SVN command lines tools or Tortoise. You can readily Google for extensive git-svn information to get started.

The workflow with git-svn for file rearrangement is fairly simple, and very fast:

  • “git svn rebase” to get your local git repo up to date with your SVN repo.
  • make new directories, rearrange files, rename files, etc., using any tool you like. This is easy and painless because git (wisely) keeps all the metadata in a top-level .git directory. It is fast because you aren’t running any svn code (or any git code) while doing so.
  • use “git add”, or the git GUI, to add the newly moved / renamed files.
  • “git commit”
  • “git svn dcommit” to push the changes over to SVN.

git / git-svn looks at the content of files, notices (rather than you needing to tell it) that the content has been moved to different filenames in different directories, and generates SVN move/copy operations. It even notices nearly-identical (edited) files and handles them correctly.

One loose end is left hanging, which is that git-svn does not remove any now-empty or removed directories from SVN, a consequence of git not caring about directories. Therefore, if you need the empty directories removed from the SVN repo, use a normal SVN client (command line, Tortoise) to do so.

From the description of above, this does not initially sound like a big win; but with hundred of files to move around, the time for the git commands (instant, except for the first and last) is irrelevant, and the time/hassle saved is enormous.

Perhaps someday SVN itself will gain the ability to “just work” when you move or rename files / directories, rather than having to be told.

Getting Started with Git and GitHub on Windows

(Update: I have a new, related post about the Best Git GUIs for Windows.)

I’ve been attracted to, and trying out, various distributed source control tools for the last two years, and have come to the conclusion that the most likely “winner” is Git. Git does a great many things right, good progress is being made in the few areas it is weak, and it has rapidly growing popularity. There are many web sites with extensive information about using Git, learning Git, Git integration, and more.

For new Oasis Digital projects, we will generally Git rather than SVN for source control. Here are instructions I wrote to help our teams get started. The contents here are 95% generic, but the references to me are, of course, Oasis-Digital-specific.

(For a general introduction to Git, consider this video at GitHub.)

GitHub

Although Git is a fully distributed source control system, it is very convenient to have a set of robust, central repositories. Oasis Digital’s repositories are hosted by GitHub:

http://github.com/

Github offers a useful set of online features to supplement what Git has built in and available locally. As of the spring of 2008, GitHub is certainly a work-in-progress, I’d characterize is as a “beta” level service. Nonetheless it is worthwhile and recommended. There is a lot to learn from the “guides” published here also:

http://github.com/guides

Install and Configure msysGit on Windows

I assume here that you are using Windows, although Git works very well (better, actually) on Linux or Mac. As I write this, the best Windows Git package is msysgit, available here:

http://code.google.com/p/msysgit/

Make sure to follow the download instructions labeled “If you only want to use Git”. As I write this the download is Git-1.5.5-preview20080413.exe, but get the current version available as you read this instead, not that specific version.

Install by running the EXE installer. Accept the default install directory. When you get to the PATH setting screen, I recommend the “Use Git Bash only” setting, because it avoid any risk of PATH conflicts.

By default, Git will be configured to translate text files between Windows CRLFs (in your working copy) and Unix LFs (in repositories). This setting is fine if you like to use an editor on Windows that insists on Windows CRLFs. I generally use an editor that is equally happy to use Unix LFs, so I sometimes use Git in the other (non-translating) mode.

msysgit includes both the git command line, and a usable GUI. The GUI is not on par with more mature products, but it is helpful and good enough for users who are allergic to the command line.

Create your SSH Key

The first step in using Git is to create your SSH Key. This will be used to secure communications between your machine and other machines, and to identify your source code changes. (If you already have an SSH key for other reasons, you can use it here, there is nothing Git-specific about this.)

In Windows Explorer, pick any convenient directory, right-click, and choose “Git Bash Here”.

Then type this command:

ssh-keygen -C "username@email.com" -t rsa

(with your own email address, of course)

Accept the default key file location. When prompted for a passphrase, make one up and enter it. If you feel confident that your own machine is secure, you can use a blank passphrase, for more convenience and less security. Note where it told you it stored the file. On the machine I tested with, it was stored in “C:\Documents and Settings\Kyle\.ssh\”

Open the file id_rsa.pub with a text editor. The text in there is your “public SSH key”. You will need it to set up your GitHub account, in the next section.

Beware of $HOME trouble: a reader reported a tricky failure mode in which some other software he installed had set up a HOME or HOME_PATH environment variable pointing in to that application instead of to your real home (“Documents and Settings”) directory.

More details on the key process are available here:

http://github.com/guides/providing-your-ssh-key

Set up your GitHub account

Go to http://github.com/ and sign up for a free account. In the sample here I signed up with a made-up alter-ego, harry@kylecordes.com. Use your own, real email address of course.

Don’t worry that GitHub describes the free account as for “open source” work; I will later add you as a collaborator to my paid (and therefore private, non-open-source) projects. Make sure to copy-paste in your SSH public key that you created earlier.

Once you have set up your github account, email your github username (not your password) to me, so I can add you to the relevant project(s). (Reminder – send to me only if you are working on an Oasis Digital project! If you are using this page as generic Git instructions, send the information to your project leader instead!)

Once I have added you as a collaborator to the relevant project and sent back its URL, you can navigate to the URL, which will look like this:

https://github.com/kylecordes/PROJECTNAME/tree/master

On that page, click the “fork” button to create your own workspace for the project. This will take you to your own page for the project, something like this:

https://github.com/YOURNAME/PROJECTNAME/tree/master

Now you can clone this project to your own machine, as discussed below.

Getting Started Locally

First, use the “Git Bash Here” feature described above, to get a command prompt. Tell Git about your name and email address:


git config --global user.email Your.Email@domain.com
git config --global user.name "Your Real Name"

Then you are ready to proceed with getting into a project. Copy the “Clone URL” from a github project page. Make a new directory on your machine, to become your working directory. There are two approaches to which project to clone.

  1. Clone from your own fork repo. This will make it trivial to push your changes up, but require one more command to get upstream changes.
  2. Clone from the upstream (my) repo. This will make it trivial to get change, but require one more command to be able to push changes, because you can’t push to another Github users’ repo.

I assume later in these instructions that you chose #1.

Next, get your local clone by clicking, or by typing.

Approach 1: GUI

In Windows Explorer, Right click the working directory and choose “Git GUI Here”.

Choose “Clone Existing Repository” in the dialog that comes up:

Paste the URL that you copied above from GitHub. Note that in the screenshot I show it as if you clone your repo, while I think it’s slightly easier overall to clone the upstream repo. Thus it really should show git@githib.com:kylecordes/sample1.git instead.
Note that your browser might add an erroneous mailto: to the URL, which you must remove – Git URLs do not start with “mailto”. Enter the directory where you want your working copy:

Make sure to use real, reasonable values. For example, you are not working “sample1” and you probably don’t keep your working projects in a directory named “GitStuff”, so put in a directory that makes sense for the project you are working on. Also, put your working copy in a place where you can effectively work in it; for example the working copy for a web project should usually be under the webroot of your local development web server.

Click Clone. You will be prompted for your passphrase if you used one. In a few minutes the Clone will finish, and you have the project available on your machine. I’ve had sporadic problems with this process hanging (growing pains at GitHub are the likely cause), so if you don’t see progress for a few minutes, stop it and start over.

Approach 2: Command Line

I find this easier. In Windows Explorer, right-click on the working directory you want and choose “Gui Bash Here”. Then enter a command like this:

git clone repoURL

Git might prompt you about an SSH key, the first time you do this with github (or any other new server). Answer “yes”.

It’s worth pointing out here, if you didn’t already understand from the various Git web sites, that Git is a distributed source control system. It will pull down the whole project history, so you can browse history and even commit changes without online access. Thus Git works very well if you have an intermittent or poor network connection.

Work Flow

As with all source control, work in the directory where you use source control. Do not copy files back and forth between here and some other working directory, that is a path to endless merge and update problems.

Once you have checked out the software, here is a summary of your work flow. For more details, please read the copies Git documentation online. I suggest reading both the official Git material, as well as other sites and articles about Git.

Getting Changes

Get changes from others with “git pull” (or using the GUI). By default this will pull from the repo from which you cloned, so if you cloned the upstream repo, that will get other peoples’ changes.

If you cloned from your own Github repo, you’ll need to use something like this:


git remote add upstream git@github.com:kylecordes/sample1.git
git pull kyle upstream

Sending Changes

Commit your changes locally with “git commit” (or using the GUI). Remember that Git generally wants you to explicitly say which files’ changes to include (“git add”), so make sure you read and understand enough about Git to do this properly; it is only a few commands or clicks in the GUI. The usual caveat applies, to only commit actual source files, not generated files or temp files.

Push your changes up to your GitHub repository with “git push”. This step will make it so others on your project can see your changes. Do this at least once per day, and ideally more often as you collaborate. Assuming that you cloned from the upstream repo, you’ll need to set up a reference to your own Github repo (the one you can push to), with something like this:


git remote add harry git@github.com:harrycordes/odtimetracker.git

As usual, use reasonable names and relevant URLs, not my sample names and URLs. Once you’ve added the remote reference, pushing is easy:


git push harry master

When you have a set of changes (one or more commits) that you think are ready to go in to the main-line of the project, use Github to issue a “pull request”. Your project lead (me, typically, at Oasis Digital) will review your commits and either pull them in to the main-line, or send feedback about changes needed before they can be pulled in.

A key thing to understand about Git is that it makes branching extremely easy and fast, so that very convenient to use branches. If you are accustomed to other source control systems where branching is a big, painful thing, it will be very different for you in Git. Once you learn to use branches, you’ll sometimes push up a branch you are working on instead of master.

I’ve only scratched the surface in this introduction. You now have Git up and running with project code in it, so pick up a Git tutorial or reference (such as the screencast videos at GitCasts) and start learning.

To Learn More

I heartily recommend the “Illustrated Guide to Git on Windows“. It doesn’t yet cover GitHub, but does cover many more details of using Git itself.

Also, a bit of Git can be very useful even in a project that uses SVN, especially when you need to rearrange a bunch of files in SVN.

Update: In a newer post, I list several other Git GUIs for Windows.

Git on Windows, it actually works now

I’ve been trying out various distributed source control tools, and used several of them for various very small projects. I’ve most mostly settled on git as the one I prefer, but I haven’t yet published any code with it. Also, I’ve been frustrated that git support for Windows has been very weak.

Msysgit appears to have solved the git-windows problem, at least well enough for small scale work. If you’ve been holding back on trying git because you use Windows, now is the time to jump in.

Update: I’ve posted details on how to get started with msysgit and GitHub as well as a comparison of Git software for Windows.

Distributed Version Control for the Other 80%

Ben Collins-Sussman, one of the key developers behind Subversion, argues in Version Control and the 80% that distributed version control will remain a niche interest, and will not move in to the mainstream (as his favorite tool certainly has). He has a number of good reasons to back up this thesis.

I think he’s wrong. The “other 80%” are not profoundly stupid imbeciles who could never grasp the point of DVCS. Rather they are, generally, working developers with important projects underway, for which they need tools to work well out of the box when used in the default way. DVCS tools can certainly do that. More specifically, the list of reasons he gives why DVCS won’t become broadly popular, should be read more as a to-do list of how to improve DVCS so they can become broadly popular. What the DVCS community needs is at least one DVCS which:

  • Installs easily on Windows, with a single installer, including diff/merge tool and GUI
  • Includes a very good standalone GUI
  • Secures client/server (peer-to-peer) communicate by default, without user setup of SSH, HTTPS, etc.
  • Integrates well with Eclipse
  • Integrates well with Visual Studio
  • Integrates well with Explorer (i.e. TortoiseBlah)
  • Integrates, begrudgingly, with Microsoft’s SCC API so as to support the many tools which can use an SCC API plugin
  • Includes permission controls for server repositories, including good tools for configuration thereof
  • Automates sharing of branches trivially (some already do this, some less so)Automates the common ways of using a DVCS, most importantly the usage model in which the DVCS is used as a better SVN with full offline capabilities
  • Guides users, if so configured, gently back toward a small number (one, in some cases) of main central branches, which is what most projects want
  • Communicates clearly what kind of project it can support well (most of them) and what kind it won’t support well (those with an enormous pile of huge files, of which most users only need a few)

(SVN itself is not without flaws. Ben lists some of them as areas in which improvements are coming, while others (such as, in my opinion, using the file namespace for branches and tags) are likely here to stay.)

In the next few years we will probably see one or more DVCS tools gain most or all of the features above. With addresses, an important truth will be more obvious: distributed source control is, in most ways, a superset of centralized source control, and the latter can be thought of as a special case of the former.

That said, though, I think the DVCS movement will lose a bit of steam when SVN ships better merge support, if that merge support is sufficiently good. The merge “features” are certainly the biggest issue we have here with SVN.

A Brief Introduction to Distributed Version Control

Last night at SLUUG, I have a talk on distributed source control tools. It was quite introductory, but the notes (below) may still be helpful. These notes were on a handout at the talk, as usual I didn’t use slides.

Unfortunately I didn’t get an audio recording of this talk, so no transcript either.

About 30 people were in attendance. Nearly 100% were familiar with CVS and SVN, and perhaps 20% with other tools (ClearCase, SourceSafe, and others). Only 4 had ever used branch/merge in any project or tool!

A Brief Introduction to Distributed Version Control

You Branch and Merge.

A fundamental truth: every time you edit a file you are branching, and every time you reconcile with another developer you are merging. In most tools you get one easy branch and merge locus: your local working directory. All other branching is a Big Deal.

It does not have to be this way.

A Short Tour of 3 tools

bzr: http://bazaar-vcs.org/

hg: http://www.selenic.com/mercurial/

git: http://git.or.cz/

Also, check your distro’s package system.

(At this point in the talk I demonstrated the basic features of each)

What is a DVCS? Why Use a DVCS?

Peer-to-peer design

Everyone gets all the features, rather than the interesting features limited to a high priest class.

Make use of the massive CPU and disk capacity on dev machines.

No central server needed, though many projects nominate a machine for this purpose.

Use a “dumb” storage location for a repository, if desired. Or a “smart server” for performance and security.

Work offline, with full history, branches, merges.

No central administrator is needed, potentially a cost savings.

Very cheap branching, in some cases immediate, even for large projects.

Very good merging, because you merge all the time.

Commit-then-merge, not merge-then-commit.

Repeated merge without havoc.

Merge keeps both sides of history, which is important because you merge a lot. This varies by tools, for example apparently bzr keeps this history less effectively than some others.

Depending on what you are coming from and which tool you choose, the speed gain can be so remarkable that it help every developer every day.

Some SVN Nitpicks

It is easy and tempting to pick on SVN. Linus does so vigorously in his online talk. I don’t hate it as much as he does but, these things bother me:

SVN is slow for large projects.

Branching in SVN sounds clever as you read its cheap copy story. It’s a great story. But actually using it is ridiculous; both in the URL-crazy syntax and utter lack of merge history.

We don’t need a better CVS, we need something much better than CVS.

.svn directories scattered all over are a pointless pain.

.svn directories are enormous, sometimes larger than the entire project history in git or hg!

Security

Security is a weak area in terms of out-of-the-box features and tutorials, because these tools come from the open source world where the default is for everyone to be able to see all code. However, with a little effort you can set up whatever security you like:

If you’re serving over HTTP, you can use Apache mod_whatever to control access.

Tunnel over SSH (in the box, in most cases) to avoid ever sending code in the clear.

Even a “dumb” storage location can be secured with SFTP.

Scripts can be used for per-branch and other fine grained access control, akin to what you can do with svn-access.conf. There are examples online.

I Can’t Use a DVCS Because:

SOX/HIPPA/CMM/etc. requires a centralized tool.

SOX/HIPPA/CMM/etc, is the standard reason why anyone can’t do anything. Some of these tools facilitate much stronger guarantees about the provenance of the source code than you get from a centralized tool, because they have credible and straightforward ways to verify that centralized store has not been compromized.

We are all in one place, therefore a DVCS makes no sense.

Actually many of the features in these tools are as useful in the same building as in a worldwide team.

Tool ZZZ is our corporate standard.

Then you should use it, don’t get fired. However, many people are using a DVCS in front of their corporate standard tool.

DVCS vs DVCS:

Many DVCS tools treat each repo/workspace as a branch and vice versa, so if you use many branches you will have many workspaces.

bzr can use shared files to reduce the bloat from this.

git handles many branches much better, with arbitrarily many per repo/workspace.

My feeling is that hg and git are more mature than bzr.

git does not work well on windows yet.

Other DVCS to Consider

Monotone – has some slick features, but does not appear to scale well to large projects. It stores information in a SQLite DB. Monotone, unlike any others I’ve seen, replicates all branches by default, which is nice.

An aside — there are fascinating thoughts on how to a data synchronization system can work, in the Monotone docs / presentations.

arch / tla — one of the early DVCSs that started all this. I have heard it is mystifying to use.

darcs — everyone loves its “cherry picking”, but I have not triedit.

SVK — optimized for being a better svn client, with offline history and merging that works. Stores merge history in SVN attribute.

More git Merging

Depending on time, I will show more of the branch / merge facilities in git, as well as its gitk GUI.

Other Resources

http://wiki.sourcemage.org/Git_Guide

http://www.adeal.eu/

Update: the following appeared a few days after my talk, it is very good, aside from slightly bogus listings in the “disadvantages” section:
http://betterexplained.com/articles/intro-to-distributed-version-control-illustrated/