DaveWentzel.com            All Things Data

Git Gotchas for the Savvy Svn'r

In the fourth post in my git series I want to cover some things that caused me issues migrating from svn to git.  Sometimes you need to think differently with git if you want to accomplish tasks that you woud've done differently with svn.  In some cases, not doing these things appropriately can really muck up your git repo.  This can be a blessing or a curse.  Git is much more flexible than svn so you should expect that if you do things incorrectly you might blow something up.  But since this is a VCS the blessing is that everything can be undone if you know how.  

Git pull/fetch/clone/checkout

This caused me the most grief.  Let me cover the purpose of each command:

  • fetch updates your local copy of a remote branch but never changes any of your own branches or working copy.  You must have a "remote" already added (git remote add origin).  If you use git fetch, remember to merge.
  • pull is a fetch with a merge.  This is the equivalent of a "svn update."  You will see the conflicts that will require a resolve.  This requires a remote already configured.  
  • clone is a combination of "git add remote origin and git pull".  In one command it will do everything.  The remote will be called "origin".  This is the equivalent of "svn checkout".  
  • checkout is a git command that is nothing like "svn checkout".  git checkout switches your local files to a different branch in a repository you already have.  You are "checking out" a different codebase than the one you are working on.  

What is "origin" and "origin/master"?

As mentioned in the last section, "origin" is the remote origin of your local repo.  You don't need to use the word "origin" but that is the convention that most developers use and it indicates that this is the "source" of the local repo.  "origin/master" is simply the master branch of the origin.

Branches are more than just "folders"

In svn a branch and a tag are really implemented as "folders".  The process of branching and tagging is very fast in svn, but doing a "svn checkout" on an existing branch/tag can be time-consuming depending on size.  git branches (and tags) do not show as filesystem subdirectories.  They instead are displayed as a kind of "third axis" that I explained in a previous post.  I love this concept, but it causes grief for those who learned VCS on svn, tfs, or p4c.  This can be confusing and can lead to noobs creating branches by copying source files from a root folder to a new folder they just created.  They are trying to make their folder structure look like what they would expect in svn.  That won't work properly in git.  

git subfolders do not function like svn subfolders.  Design your git repos accordingly

It's not uncommon for a large organization to have one HUGE svn repo where each root level subfolder is a separate project.  If you've ever looked at your svn folder structures EVERY subfolder has its own .svn hidden folder.  This means that you can "svn checkout" at any subfolder in a HUGE tree and get just that folder recursively.  This saves a bit of time.  

You need to unlearn that behavior in git.  In git you get the entire repo...there is no "git clone" or "git fetch" of a subfolder.  

So, when migrating from svn to git you should also consider restructuring your repos if you have one big enterprise repo with lots of subprojects.  This leads to the misperception that git can't handle large repos.  Not at all.  I use very large repos everyday and I think they peform better than svn (anecdotal only).  I've worked with svn repos where if I did a "svn checkout" at the root it would take hours.  So the solution is to only get those "subprojects" that you need.  In git you need to have separate repos that you clone.  

But my organization absolutely MUST have subfolders-per-project and one giant repo

That seems doubtful to me but you can still do it.  For instance, if you have shared code classes you probably do want to have that shared code in only one place.  In these cases the git feature you want to research is "git submodule".  git submodule allows a foreign repository to be embedded in a local repo.  In the case of the organization with one giant code repo you would have a master git project and then use git submodule for distinct sub-projects under that.  This performs better and is logically more consistent for many.  This allows your organization's "code portfolio" to exist in one location.  

There are other great uses for git submodule.  The most popular is when you want to integrate some open source code into your existing software.  If you use a submodule then you can maintain your codebase's version separate from the open source module's code.  You can "take on" new versions of the module when you want to.  

If you've ever had to participate in a "black duck analysis" then you'll recognize why some company's INSIST on one central software sourcecode repo/portfolio.  git submodule is what you should be using.  A black duck analysis of your codebase looks for inappropriate use of open source software in home-grown software.  Depending on which license (GPL, GNU, copyleft, etc etc) is applied to the open source code your organization may be required to open source its software as well, which it may not want to do.  Black Duck looks for legal liabilities caused by the inappropriate introduction of open source software in your code.  A company I consulted for implemented MongoDB without appropriately understanding Mongo's GNU AGPL v3 licensing.  This caused a Black Duck audit issue and the company was forced to buy commercial licenses for Mongo to avoid open sourcing its product.  

If you did some research and didn't like "git submodule" there is another alternative.  "git sparse clone" and "git sparse checkout".  Essentially this creates a kind of filter config spec in your config file that says you only want to clone certain trees.  I've read that this can be dangerous when merging sparse trees so I've never tried it.  

How do you revert a change in git?

The svn command I use most regularly is "svn revert" which essentially is the "undo button" for any changes you have made.  Your file(s) will be rolled back to what the HEAD is.  This is especially useful if you use SSIS or Visual Studio where it is necessary to check out a file even if you only want to VIEW its contents.  In git you must:

git checkout <file>

This essentially is telling git to re-checkout the given file and replace your current changes.  

Handling file moves/renames/copies/deletes

It seems like every VCS handles file renames and moves less-than-optimally.  At a minimum every VCS I've used has lost the file history of a moved or renamed file unless the move/rename is done from the VCS tool and not the OS or the IDE.  Git is no better here.  In fact, since git is less concerned with "files" than with "snapshots of the current repo" you don't even get an option in git to denote that a given change is a rename/move.  This causes some SCM people to freak out that their history is lost.  It really isn't, you just have to "remember" that the older history is associated with the older file name.  

I never rely on my VCS to handle moves/renames nicely.  Instead I always do a move/rename/delete as its own change with a commit message that clearly denotes what I'm moving/renaming/deleting and where the destination is.  This will help the next person who is trying to find the older history.  

If you are using git and not a GUI you will see, sometimes, a message after your git commit that indicates that git determined a file was a rewrite or rename.  Git has a heuristics engine that determines this for you but it isn't perfect.  If you rename a file at the same time you switch all CRLFs to LFs, it won't work.  It does work flawlessly with file copies which means it won't need to keep the copy in the repo wasting space.  However, I can't think of many times when files are copied and maintained in a VCS.  

You need a git utility that looks just like TortoiseSVN

TortoiseSVN seems to be the svn gui that everyone uses.  The closest thing is git extenstions with shell extensions turned on.  I prefer just using gitgui and the git bash shell that comes with msysgit, but if you need something closer to TortoiseSVN, git extensions is what you want.  

In the next post I'll cover one last git gotcha for git noobs that can have some serious ramifications.