I assisted one of my clients in migrating a subversion (svn) repository to git (and gitlab). Over the next few posts I'm going to cover the experience. If you've never used git or a distributed VCS you might want to read these posts. I'll cover why we moved to git, how to get git up and running quickly (even if it's just for your own professional development), and some major headaches you'll likely have with git if you are coming from svn or even tfs.
What is git?
I'm not going to cover the basics of git. I assume you know the basics already. If you don't you should read Pro Git, which is free.
Why learn git?
Let's say you have svn where you work and have no plans to migrate. You think it works great for your needs. The fact is, the rest of the world seems to be migrating towards git and other distributed VCSs. If you ever want to work on a CodePlex project in the future you really need to learn git. The same holds true for just about every open source project...git is the de facto standard. Even if you don't care about open source learning git will help you be a more productive developer. Having a cursory knowledge of Oracle will help any SQL Server guy understand Microsoft's offerings, and shortcomings, in the process.
Why migrate to git?
My client had a huge Subversion (svn) repository and it just wasn't working anymore. I had used git before for things like codeplex and drupal, but have never actively done team development with it. It seemed "ok" to me but subversion always suited my needs and I felt no compelling reason to learn git past a cursory examination. But everyone raves about it and I know that I am certainly not an expert, but if others are loving it and having success with it, it would probably be better for my client's needs as well. I proposed migrating one of the smaller repositories to git as a proof-of-concept.
This blog post and the next will list the problems and annoyances with svn that we were hoping to solve with git.
Truly Distributed Version Control
My client was off-shoring more work and it was a nightmare maintaining a svn repo in both India and the US. Git is truly totally distributed. Now, at some point you'll probably need to have a true "master" version of the truth. This is very easy to do. We instructed our India team to do a "git push" to the US repo regularly. The US replica would be the sole version of the truth and source of all releases, builds, and release branches.
Fast (especially branching)
Everybody says that branching is "cheap" in central VCSs like svn or TFS. Yes, the "branch" process is "cheap" (files are not actually copied, rather a pointer to the original plus any "patch deltas" is all that is persisted). But the real problem is the time it takes to switch a workspace from one branch to another. This involves copying all of the project's files into a second directory. Internally to svn the process is fast, in reality, for the developer, this was averaging about 10 minutes for our largest project. Plus a lot of local disk space. TFS works the same way. This means that developers resist branching when they should be. In git I have yet to see the branch process take longer than a second. Even on really large repos.
Work Totally Disconnected
How many times have you done a "svn up" and then tried to do some development at the park or on a plane. It works fine until you need to compare your current version with some version from a couple releases ago. You can't do it in svn. You can in git. When you do a "git pull" (or fetch or clone) you get the whole history of everything locally. If you do off-shoring and have spotty WAN links, this is a great way to keep your team productive.
Branches are logically branches, not subfolders
If you have a project or build process that recursively processes folders, svn branching gets in the way because the branch is really another folder on your filesystem. If you branch at the wrong place your recursive processing will break. For instance, there are two generally accepted branching strategies with Subversion..."branches under the project" or "projects under the branches". The graphic at right denotes the former, which is the far more prevalent branching strategy in my experience. This is depicting the "calc" project which contains the "trunk" code (and each organization uses "trunk" a little differently) as well as a branches folder with subfolders for each and every branch. If you were to do a "svn up" from the root "calc" folder, and you were branching properly, you'd have a LOT of files transferred locally, and you probably don't care about those older branches.
Logically to the noob developer when you need to merge from "my-calc-branch" to "trunk" it appears as though we are merging files. That's logically wrong. It's all the same file, we are simply merging branched versions. Subversion makes this conceptually muddy which is why people aren't branching as much as they should be.
So, how does git do it? See the graphic at right from gitlab. Each folder is truly a subfolder. There are no folders that are really branches like in subversion.
Branching in git is handled "on a third plane", or a Z axis. That third axis is equivalent to the branch. When you look at your project's files in something like gitlab you'll even note that the branches are represented by a separate drop down that changes the file lists. Logically this makes so much more sense to me. A branch really isn't a folder and it shouldn't be represented as one.
When you do a "git fetch" (or pull or clone) you actually get every branch that is available and you can browse them using gitgui (or whatever tool you want to use) just like you would with gitlab.
Real Shelving (Stashing)
Ever have this happen? You are diligently working on some task with a bunch of source code files checked out and in a state of disarray when someone will approach you and need something critical fixed NOW. Yes, I know that a good kanban-er will simply reply, "add it to the backlog...I'm at my WIP limit", but the fact is life is never this easy. Basically the requirement is to "save" your WIP somewhere else, revert it, and go work on this new issue.
In TFS you have Shelf Sets (shelving). Awesome feature. The shelf process saves off your changes to the server, reverts your code, and when you are ready you simply unshelve your changes, merge if you have to, and continue working. You can even share shelf sets with others. The biggest problem is shelf sets don't travel across branches.
In svn this can be tricky. If you are doing your development in an isolated local branch that only contains only your changes then you have no problem (and you really should be doing this...but again, branching seems to be under-utilized). You simply go to a new folder structure and do a svn pull on the given branch and do your work there.
But unfortunately, too many people don't take the time to branch their dev work locally when using svn because it takes too long (see above). Now you have a problem because you need to revert your changes, but you don't want to lose them. Your best bet in svn is a patch file but the patch file is not pushed to the server like TFS. Don't lose it.
This whole process seems way to difficult. In git you have the "git stash" command. Conceptually it works just like a branch but gives you some extra commands. You could simply have another local branch because remember that in git branching really is cheap and quick.
I'll cover more reasons we moved to git in the next post, nodetitle.