I love git, it’s such a good tool. Granted, it’s the only VCS I’ve ever used in
anger, but is so ubiquitous now, hard to see seriously using something
elseThough I think theoretically I prefer something like
darcs, I’m usually pushed back to git. I’m hopeful for
Pijul to improve upon darcs.
.
It also has a fun birth story, and in 2016 an added twist of irony. Seriously, just read a bit about it and BitKeeper.
As great a tool as it is, it’s interface can leave much to be desired. This has improved over the years, but it still remains not an easy tool to pick up.
Thankfully you can get pretty far learning just a few commands. To use it effectively though takes some time to develop. I’ll talk about how I tend to use it and things I’ve learned in hopes it’s useful.
Config
My git config,
some behavior changing options set for my usual workflow (like rerere
), but
mostly useful for the aliases.
I use Oh My Zsh and have it’s git
plugin
enabled. I tend to use my git aliases, but some of the shell aliases are nice, I
use gst
a lot.
You might find some of the tools in Git Extras useful.
The Three Trees
This is a basic way to think about how a git repository works and will help clarify what is happening, what commands are doing, once you grok it.
The full article in the git
book on this has a
more complete walkthrough, with images. It also gives a breakdown between the
usage of reset
and checkout
commands as an illustration. I’ll give a short
description of the mental model here.
First thing to note is that the git repository really lives in the .git/
directoryThis is the content of a “bare” repository, such as if you git init --bare
, which creates a repo without a work tree.
. This is the full content,
all history and files, stored in git’s internal data format.
The first two trees live there.
The HEAD
tree points to the last commit.
The Index tree, points to the next commit. This is the staging area, what you
git add
files into and then what gets committed on a git commit
.
The “repo” that you work in day-to-day, the actual files you edit are called the “working directory” or work tree. When you change branches or anything, git resolves it’s internal data for the given commit into real files in the work tree so you can, well, work with them.
Most of the commands you use regularly are around manipulating the state of
these three trees. Adding changes from the work tree into the index, then
committing the index into HEAD
. Or conversely, checking out a different
branch, which moves HEAD
to point to the other branch, and updates the work
tree based on the new HEAD
.
For more, read through the full article.
You may also find this interactive example helpful, though it uses slightly different terminology:
HEAD
tree → Local Repository- Index tree → Index (this one is the same)
- Work tree → Workspace
And it also shows how some commands interact with the stash and remote copy of the repository, going a little beyond the “core” three trees.
pathspec
A less commonly learned, but useful bit of info to read and have as references
is the pathspec
doc.
A useful pattern I use sometimes is :^*/package-lock.json
for seeing info
about everything but the npm package lock files (which are just noise, but
tracked), e.g., git diff master ":^*/package-lock.json"
.
Searching History
git log
is your friend. If you look at my config you’ll see a few different
aliases for git log
with various flags, all providing slightly different views
of the history. These are nice high-level views.
But we can do more than just look at a list of all commits of course. We can
search! The -S
and -G
flags will search commits for changes including the
search term, e.g., git log -S hello
will list all commits that include changes
with hello
in themThis is called
pickaxe. So if you see/hear
the term “git pickaxe” this searching is what they are referring to.
. Of course
there are a huge number of other flags you can dive into, but note you can
combine these flags with aliases, so if you have views you like aliased, you can
just throw on a -S
to limit the view to the changes you are interested in
(e.g., for me git ll -S <term>
, get lll -S <term>
, etc.).
You can even limit the search to just the history of specific functions
(:func_name:<file>
) or lines (10,+15:<file>
or 15,45:<file>
) of a file
with -L
, though that’s getting a little advanced.
git blame <file>
can also be useful in navigating through the history of line,
though a using blame in a GUI makes it a lot nicer.
Sometimes you aren’t interested in searching the changes of a commit, but the
commit message, e.g., looking for all commits mentioned a specific bug number,
which you can do with git log --grep=<terms>
.
See more in the official docs.
Commit messages
The first line should be “subject” or title of the change. It should be a brief, but clear summary of what is changing. Try to keep it close to 80 characters, but being accurate and helpful is more important than line length.
Write in the imperative present tense, that is, as if the commit does an action.
Enable foo
is good, Enabled foo
or I enabled foo
are to be avoided. A
commit didn’t do something, it does something.
The next lines are the summary section and should provide any further clarification on the work (esp. anything non-obvious) and particularly why and big-picture what, we can see the how and specifics by looking at the code.
Many others have written on this, How to Write a Git Commit Message by Chris Beams is detailed and links to many of other big posts in this area. I particularly like/agree with what the folks who make Phabricator have to say on commit messages.
Rebase/rewriting history
I rebase a lot in my usual workflow, rewriting history, sqaushing commits together, breaking them apart, etc. Some folks are scared of this. Others hate it. There is an age old philosophical divide.
It’s clear where I fall and not going to get into it here, just give some tips on how I tend to use this stuff.
First, “rebase”, the word itself. Your work, as represented by a sequence of commits, come after another commit, they are “based” on that commit, stem from it; so to “re-base”, means to choose a new “base” for your work, literally pick up a sequence of commits and move them on top of a different starting commit.
When you want to rebase a branch on another that has some new commits, a plain
git rebase <branch>
can be enough in simple situation. The --onto
flag to
git rebase
comes into play in more complicated situations or when you want to
be explicit.
A simplified look at it, in the form of git rebase --onto <branch> <commit before the first one you want>
. Think of it as “pick up all commits after
<commit>
and apply them on top of (i.e., onto) <branch>
.”
This comes into play a bunch when you are working on a feature branch (feat-b
)
that depends on another feature branch (feat-a
). As feat-a
continues
development, maybe itself rebases on master
, and you want to catch your branch
up, just a git rebase feat-a
from feat-b
will likely not do what you want. A
git rebase --onto feat-a <commit before your work in feat-b>
can get it
sorted. Let’s look at an example from the rebase help page:
First let’s assume your topic is based on branch next. For example, a feature developed in topic depends on some functionality which is found in next.
o---o---o---o---o master
\
o---o---o---o---o next
\
o---o---o topic
We want to make topic forked from branch master; for example, because the functionality on which topic depends was merged into the more stable master branch. We want our tree to look like this:
o---o---o---o---o master
| \
| o'--o'--o' topic
\
o---o---o---o---o next
We can get this using the following command:
git rebase --onto master next topic
Next let’s talk about interactive rebases, git rebase -i
. A supremely helpful
feature, it let’s us manipulate each commit involved in a rebase, from simply
using it as is, rewording it’s commit message, to combining commits or throwing
them out all together.
Let’s say you have some work on a branch new-wizbang-feature
and you want to
clean up it’s history. You can run git rebase -i master
on the branch, this
will bring up your editor with a list of commits with pick
in front of each.
You options will be explained in the buffer as well, but you can change that
pick
to any of the other options to take that action on a commit.
When you stop to perform some action on a commit, issue git rebase --continue
(my alias git rc
) to move on to the next action. If you ever get confused or a
rebase goes wrong, just run git rebase --abort
(my alias git ra
) and
everything will be undone and you can start again. This should also be explained
in the git status
message.
And don’t be too scared of breaking things, almost everything you do is
recoverable/reversable. git reflog
is your friend.
More:
- Read the Git book chapter on rebasing
My preferred workflow
My general approach to a git project is simple, there is one branch, called
master
(or whatever you want), this is “the project”, it represents the latest
version of the thing.
Development happens in feature branches, when the feature is ready, it is
rebased and merged to master. I like a clean linear history, but if you like
merge commits, that’s fine, I only suggest making the merge commit message
useful so when viewing the history with --first-parent
you can easily track
what is actually happening to the codebase.
Note that when I say “feature branch” I just mean a branch where the changes to
support the feature live while being developed and reviewed. Controlling the
“feature” in production should be done with feature flags on the application,
that is, the code is merged to master
and there is a run-time system to enable
or disable the new code for testing/validation/experimentation. The “feature
branch” is just a temporary workspace for writing code, not a deployment
mechanism.
Looking at the default GitHub template for merge commit messages for example,
Merge pull request #xxx from <user>/<branch>
, isn’t the worst if the branch
name is reasonable, but even so, the information I most care about, what this
commit does, is pushed to the end of the line after a bunch of gobbledygook.
When looking through the history, it would be nicer to see Do thing
with the
Merge pull ...
details in the summary section, easily available, but out of
the way.
And I’m against using merges as an excuse to let a messy, meandering history into the master branch. Some folks like including every part of the dev process in their commit history including the false starts, dead ends, and intermediate steps. I’m most interested in the final result of the work and how it impacts the code. So while developing (or before landing if it comes to it), squash things together logically so the history reads well.
If you went down an obvious path that didn’t work out, then just write that in the commit message, “The obvious approach would be x but due to y it doesn’t quite work out”, “Tried a but went with b because…”
If I want the full back and forth on something, I’ll pull up the code review system with the full conversation and iteration the code went through.
So if you must do merges, instead of:
* Merge some stuff from Joan
|\
| * Fix
| * Oops, redo
| * crap
| * Try this?
| * WIP
| * WIP
| * WIP
| * Start thing
|/
* Another thing
* Merge other stuff from Bob
|\
| * Feedback fixes
| * Try this other thing
| * Feedback fixes
| * Totally new approach
| * WIP
| * WIP
|/
Aim for something like:
* Do thing
|\
| * New thing UI
| * Rework big subsystem
| * New thing API endpoints
|/
* Another thing
* New feature
|\
| * Utilize updated vendor to do thing
| * Update vendor component
|/
But again, I prefer to work on things at the scale they can be logically grouped and squashed up into a linear history.
* Do thing
* Another thing
* New feature utilizing updated vendor thing
Basically regardless of if you are rebasing or merging make the history easy to follow, to be useful for people reading the history. I generally subscribe to the one idea is one commit approach.
Versions get taggedWith an annotated
tag,
preferably one that is signed, not a lightweight one.
, if fixes or features
need backported, branch from the tag, apply necessary changes and tag new
release on that branch. If you don’t do “versions”/releases, but just regularly
want to promote work to some “stable” state, then have a stable
branch that
you update when appropriate.
Every commit that hits master
should be “buildable”The definition of which
could vary based on the project.
. For the rare cases where there is some large
amount of inter-related simultaneous work from multiple people needs to land,
then make an “integration” branch or something to pull together the required
work for testing. But often, just pick an order to land things and work on the
features based on the other appropriate feature branches, rebasing often.
I guess really, just get comfortable rebasing. Frequent small rebases are often
ultimately easier and safer than one big one at the end. You’ll probably want to
turn on rerere
to make
rebasing often less painful.
Lots of folks like git-flow. I think it’s over-complicated and I’ve never be a part of a project where it was helpful. But if it works for you then cool.
Rather than having a ton of branches interweaving with merges, I prefer to have
one trunk (master
) with many (mostly temporary) branches from it for specific
purposes.
Your needs may be different.
More:
Worktrees
There are occasions when it’s useful to have two separate checkouts backed by the same repository, say when you have to support an old version of an application, possibly many years old, with a radically different file structure.
Or possibly your test suite takes a long time to run, you can kick it off on a codebase in a separate worktree while continuing work on a different on in the mean time. Similar for compiling a big thing.
Or you just need to switch between some branches a bunch and tire of worrying about stashing changes every time you switch.
This is supported by git with worktrees. This allows you checkout more than one branch from a single git repository.
You can of course do this by cloning the repo multiple times (or copying .git/
to a new dir, same thing), which is fine, but this has a couple downsides. One,
it multiplies the amount of disk space used, for large repos this could be an
issue. Two, it adds another layer to manage, you now need to pull updates for
each of the clones, which is inconvenient.
When you have one repo and you just want branches checked-out simultaneously, worktrees are what you want.
Repo archives
Some times it’s useful to have a snapshot of your repo (or a part of the repo) at a certain commit, for sharing or just as an artifact to store somewhere.
git archive is your friend. It will package up the repo in a zip or tarball, respecting your gitignore and such.
Make an archive for your version v0.1.0
:
git archive -o v0.1.0.tar.gz v0.1.0
You can also limit it to just certain paths:
git archive -o v0.1.0.tar.gz v0.1.0 assets/
See more examples in the docs linked above.
You can also use it to retrieve an archive from a remote repo, if you don’t
need the history, just the files at a certain commit. For example, say you just
want the code for version v1.2.3
from a remote repo:
git archive --remote=<repo address> v1.2.3
GitHub also provides a URL you can hit for an archive of the repo in the format of:
https://github.com/<user>/<repo>/archive/<reference>.tar.gz
For example:
https://github.com/Foo/Bar/archive/6c4f88684b79486cb8c0842d7297d384e070bbac.tar.gz
Or
https://github.com/Foo/Bar/archive/v1.2.3.tar.gz
You can replace the .tar.gz
with .zip
if you want a zip over a tarball. The
link for this is also provided in the GitHub UI under the “Clone or download”
button.
gitignore
So any repo or directory in a repo can have a .gitignore
file and git
will, uh, ignore the files that match the patterns. “Project” ignores.
You can also have a global ignore file set to have really common stuff ignored
in all your projects. By default this is ~/.config/git/ignore
, but the
location can be overridden by the core.excludesFile
configuration value (mine
uses ~/.gitignore
). These of course are not shared with the project.
“Personal” ignores.
You can also have “Personal Project” ignores by specifying patterns in
.git/info/exclude
in a project (that’s the info/exclude
file the projects
.git
directory). These are applied to the project, but not committed to the
repo and so not shared with others who check it out. When you have some files
laying around in a repo that you don’t want/need to share (maybe personal notes
or todos for the project, your scratch space, some hacky scripts, etc.), but
also don’t want showing up as untracked constantly, you can put them there.
Checkout file from other branch
Sometimes you need to checkout current state of file from different branch. It’s just:
git checkout <branch> -- <file path>
Misc. Stuff
- ndpsoftware.com/git-cheatsheet.html
- Knowledge is Power: Getting out of trouble by understanding Git by Steve Smith (basic intro)
- h2.jaguarpaw.co.uk/posts/git-survival-guide/ (very opinionated use of git)
- When you need to nuke things: stackoverflow.com/a/64966
- When you need to fix things: sethrobertson.github.io/GitFixUm/fixup.html
- makeareadme.com
- Git Koans by Steve Losh
- legends2k.github.io/note/git_concepts/
- legends2k.github.io/note/git_nuances/