Git Tips

I love git, it’s such a good tool. Granted, it’s the only VCS I’ve ever used in anger, but is so ubiquitous now, hard to see seriously using something elseThough I think theoretically I prefer something like darcs, I’m usually pushed back to git. I’m hopeful for Pijul to improve upon darcs.

.

It also has a fun birth story, and in 2016 an added twist of irony. Seriously, just read a bit about it and BitKeeper.

As great a tool as it is, it’s interface can leave much to be desired. This has improved over the years, but it still remains not an easy tool to pick up.

Thankfully you can get pretty far learning just a few commands. To use it effectively though takes some time to develop. I’ll talk about how I tend to use it and things I’ve learned in hopes it’s useful.

Config

My git config, some behavior changing options set for my usual workflow (like rerere), but mostly useful for the aliases.

I use Oh My Zsh and have it’s git plugin enabled. I tend to use my git aliases, but some of the shell aliases are nice, I use gst a lot.

You might find some of the tools in Git Extras useful.

The Three Trees

This is a basic way to think about how a git repository works and will help clarify what is happening, what commands are doing, once you grok it.

The full article in the git book on this has a more complete walkthrough, with images. It also gives a breakdown between the usage of reset and checkout commands as an illustration. I’ll give a short description of the mental model here.

First thing to note is that the git repository really lives in the .git/ directoryThis is the content of a “bare” repository, such as if you git init --bare, which creates a repo without a work tree.

. This is the full content, all history and files, stored in git’s internal data format.

The first two trees live there.

The HEAD tree points to the last commit.

The Index tree, points to the next commit. This is the staging area, what you git add files into and then what gets committed on a git commit.

The “repo” that you work in day-to-day, the actual files you edit are called the “working directory” or work tree. When you change branches or anything, git resolves it’s internal data for the given commit into real files in the work tree so you can, well, work with them.

Most of the commands you use regularly are around manipulating the state of these three trees. Adding changes from the work tree into the index, then committing the index into HEAD. Or conversely, checking out a different branch, which moves HEAD to point to the other branch, and updates the work tree based on the new HEAD.

For more, read through the full article.

You may also find this interactive example helpful, though it uses slightly different terminology:

  • HEAD tree → Local Repository
  • Index tree → Index (this one is the same)
  • Work tree → Workspace

And it also shows how some commands interact with the stash and remote copy of the repository, going a little beyond the “core” three trees.

pathspec

A less commonly learned, but useful bit of info to read and have as references is the pathspec doc. A useful pattern I use sometimes is :^*/package-lock.json for seeing info about everything but the npm package lock files (which are just noise, but tracked), e.g., git diff master ":^*/package-lock.json".

Searching History

git log is your friend. If you look at my config you’ll see a few different aliases for git log with various flags, all providing slightly different views of the history. These are nice high-level views.

But we can do more than just look at a list of all commits of course. We can search! The -S and -G flags will search commits for changes including the search term, e.g., git log -S hello will list all commits that include changes with hello in themThis is called pickaxe. So if you see/hear the term “git pickaxe” this searching is what they are referring to.

. Of course there are a huge number of other flags you can dive into, but note you can combine these flags with aliases, so if you have views you like aliased, you can just throw on a -S to limit the view to the changes you are interested in (e.g., for me git ll -S <term>, get lll -S <term>, etc.).

You can even limit the search to just the history of specific functions (:func_name:<file>) or lines (10,+15:<file> or 15,45:<file>) of a file with -L, though that’s getting a little advanced.

git blame <file> can also be useful in navigating through the history of line, though a using blame in a GUI makes it a lot nicer.

Sometimes you aren’t interested in searching the changes of a commit, but the commit message, e.g., looking for all commits mentioned a specific bug number, which you can do with git log --grep=<terms>.

See more in the official docs.

Commit messages

The first line should be “subject” or title of the change. It should be a brief, but clear summary of what is changing. Try to keep it close to 80 characters, but being accurate and helpful is more important than line length.

Write in the imperative present tense, that is, as if the commit does an action. Enable foo is good, Enabled foo or I enabled foo are to be avoided. A commit didn’t do something, it does something.

The next lines are the summary section and should provide any further clarification on the work (esp. anything non-obvious) and particularly why and big-picture what, we can see the how and specifics by looking at the code.

Many others have written on this, How to Write a Git Commit Message by Chris Beams is detailed and links to many of other big posts in this area. I particularly like/agree with what the folks who make Phabricator have to say on commit messages.

Rebase/rewriting history

I rebase a lot in my usual workflow, rewriting history, sqaushing commits together, breaking them apart, etc. Some folks are scared of this. Others hate it. There is an age old philosophical divide.

It’s clear where I fall and not going to get into it here, just give some tips on how I tend to use this stuff.

First, “rebase”, the word itself. Your work, as represented by a sequence of commits, come after another commit, they are “based” on that commit, stem from it; so to “re-base”, means to choose a new “base” for your work, literally pick up a sequence of commits and move them on top of a different starting commit.

When you want to rebase a branch on another that has some new commits, a plain git rebase <branch> can be enough in simple situation. The --onto flag to git rebase comes into play in more complicated situations or when you want to be explicit.

A simplified look at it, in the form of git rebase --onto <branch> <commit before the first one you want>. Think of it as “pick up all commits after <commit> and apply them on top of (i.e., onto) <branch>.”

This comes into play a bunch when you are working on a feature branch (feat-b) that depends on another feature branch (feat-a). As feat-a continues development, maybe itself rebases on master, and you want to catch your branch up, just a git rebase feat-a from feat-b will likely not do what you want. A git rebase --onto feat-a <commit before your work in feat-b> can get it sorted. Let’s look at an example from the rebase help page:

First let’s assume your topic is based on branch next. For example, a feature developed in topic depends on some functionality which is found in next.

        o---o---o---o---o  master
             \
              o---o---o---o---o  next
                               \
                                o---o---o  topic

We want to make topic forked from branch master; for example, because the functionality on which topic depends was merged into the more stable master branch. We want our tree to look like this:

        o---o---o---o---o  master
            |            \
            |             o'--o'--o'  topic
             \
              o---o---o---o---o  next

We can get this using the following command:

    git rebase --onto master next topic

Next let’s talk about interactive rebases, git rebase -i. A supremely helpful feature, it let’s us manipulate each commit involved in a rebase, from simply using it as is, rewording it’s commit message, to combining commits or throwing them out all together.

Let’s say you have some work on a branch new-wizbang-feature and you want to clean up it’s history. You can run git rebase -i master on the branch, this will bring up your editor with a list of commits with pick in front of each. You options will be explained in the buffer as well, but you can change that pick to any of the other options to take that action on a commit.

When you stop to perform some action on a commit, issue git rebase --continue (my alias git rc) to move on to the next action. If you ever get confused or a rebase goes wrong, just run git rebase --abort (my alias git ra) and everything will be undone and you can start again. This should also be explained in the git status message.

And don’t be too scared of breaking things, almost everything you do is recoverable/reversable. git reflog is your friend.

More:

My preferred workflow

My general approach to a git project is simple, there is one branch, called master (or whatever you want), this is “the project”, it represents the latest version of the thing.

Development happens in feature branches, when the feature is ready, it is rebased and merged to master. I like a clean linear history, but if you like merge commits, that’s fine, I only suggest making the merge commit message useful so when viewing the history with --first-parent you can easily track what is actually happening to the codebase.

Note that when I say “feature branch” I just mean a branch where the changes to support the feature live while being developed and reviewed. Controlling the “feature” in production should be done with feature flags on the application, that is, the code is merged to master and there is a run-time system to enable or disable the new code for testing/validation/experimentation. The “feature branch” is just a temporary workspace for writing code, not a deployment mechanism.

Looking at the default GitHub template for merge commit messages for example, Merge pull request #xxx from <user>/<branch>, isn’t the worst if the branch name is reasonable, but even so, the information I most care about, what this commit does, is pushed to the end of the line after a bunch of gobbledygook. When looking through the history, it would be nicer to see Do thing with the Merge pull ... details in the summary section, easily available, but out of the way.

And I’m against using merges as an excuse to let a messy, meandering history into the master branch. Some folks like including every part of the dev process in their commit history including the false starts, dead ends, and intermediate steps. I’m most interested in the final result of the work and how it impacts the code. So while developing (or before landing if it comes to it), squash things together logically so the history reads well.

If you went down an obvious path that didn’t work out, then just write that in the commit message, “The obvious approach would be x but due to y it doesn’t quite work out”, “Tried a but went with b because…”

If I want the full back and forth on something, I’ll pull up the code review system with the full conversation and iteration the code went through.

So if you must do merges, instead of:

* Merge some stuff from Joan
|\
| * Fix
| * Oops, redo
| * crap
| * Try this?
| * WIP
| * WIP
| * WIP
| * Start thing
|/
* Another thing
* Merge other stuff from Bob
|\
| * Feedback fixes
| * Try this other thing
| * Feedback fixes
| * Totally new approach
| * WIP
| * WIP
|/

Aim for something like:

* Do thing
|\
| * New thing UI
| * Rework big subsystem
| * New thing API endpoints
|/
* Another thing
* New feature
|\
| * Utilize updated vendor to do thing
| * Update vendor component
|/

But again, I prefer to work on things at the scale they can be logically grouped and squashed up into a linear history.

* Do thing
* Another thing
* New feature utilizing updated vendor thing

Basically regardless of if you are rebasing or merging make the history easy to follow, to be useful for people reading the history. I generally subscribe to the one idea is one commit approach.

Versions get tagged[^With an annotated tag, not a lightweight one.], if fixes or features need backported, branch from the tag, apply necessary changes and tag new release on that branch. If you don’t do “versions”/releases, but just regularly want to promote work to some “stable” state, then have a stable branch that you update when appropriate.

Every commit that hits master should be “buildable”The definition of which could vary based on the project.

. For the rare cases where there is some large amount of inter-related simultaneous work from multiple people needs to land, then make an “integration” branch or something to pull together the required work for testing. But often, just pick an order to land things and work on the features based on the other appropriate feature branches, rebasing often.

I guess really, just get comfortable rebasing. Frequent small rebases are often ultimately easier and safer than one big one at the end. You’ll probably want to turn on rerere to make rebasing often less painful.

Lots of folks like git-flow. I think it’s over-complicated and I’ve never be a part of a project where it was helpful. But if it works for you then cool.

Rather than having a ton of branches interweaving with merges, I prefer to have one trunk (master) with many (mostly temporary) branches from it for specific purposes.

Your needs may be different.

More:

Worktrees

There are occasions when it’s useful to have two separate checkouts backed by the same repository, say when you have to support an old version of an application, possibly many years old, with a radically different file structure.

Or possibly your test suite takes a long time to run, you can kick it off on a codebase in a separate worktree while continuing work on a different on in the mean time. Similar for compiling a big thing.

Or you just need to switch between some branches a bunch and tire of worrying about stashing changes every time you switch.

This is supported by git with worktrees. This allows you checkout more than one branch from a single git repository.

You can of course do this by cloning the repo multiple times (or copying .git/ to a new dir, same thing), which is fine, but this has a couple downsides. One, it multiplies the amount of disk space used, for large repos this could be an issue. Two, it adds another layer to manage, you now need to pull updates for each of the clones, which is inconvenient.

When you have one repo and you just want branches checked-out simultaneously, worktrees are what you want.

Repo archives

Some times it’s useful to have a snapshot of your repo (or a part of the repo) at a certain commit, for sharing or just as an artifact to store somewhere.

git archive is your friend. It will package up the repo in a zip or tarball, respecting your gitignore and such.

Make an archive for your version v0.1.0:

git archive -o v0.1.0.tar.gz v0.1.0

You can also limit it to just certain paths:

git archive -o v0.1.0.tar.gz v0.1.0 assets/

See more examples in the docs linked above.

You can also use it to retrieve an archive from a remote repo, if you don’t need the history, just the files at a certain commit. For example, say you just want the code for version v1.2.3 from a remote repo:

git archive --remote=<repo address> v1.2.3

GitHub also provides a URL you can hit for an archive of the repo in the format of:

https://github.com/<user>/<repo>/archive/<reference>.tar.gz

For example:

https://github.com/Foo/Bar/archive/6c4f88684b79486cb8c0842d7297d384e070bbac.tar.gz

Or

https://github.com/Foo/Bar/archive/v1.2.3.tar.gz

You can replace the .tar.gz with .zip if you want a zip over a tarball. The link for this is also provided in the GitHub UI under the “Clone or download” button.

gitignore

So any repo or directory in a repo can have a .gitignore file and git will, uh, ignore the files that match the patterns. “Project” ignores.

You can also have a global ignore file set to have really common stuff ignored in all your projects. By default this is ~/.config/git/ignore, but the location can be overridden by the core.excludesFile configuration value (mine uses ~/.gitignore). These of course are not shared with the project. “Personal” ignores.

You can also have “Personal Project” ignores by specifying patterns in .git/info/exclude in a project (that’s the info/exclude file the projects .git directory). These are applied to the project, but not committed to the repo and so not shared with others who check it out. When you have some files laying around in a repo that you don’t want/need to share (maybe personal notes or todos for the project, your scratch space, some hacky scripts, etc.), but also don’t want showing up as untracked constantly, you can put them there.

https://git-scm.com/docs/gitignore

Checkout file from other branch

Sometimes you need to checkout current state of file from different branch.

It’s just:

git checkout <branch> -- <file path>

Misc. Stuff