When to rewrite Git history?
Ok, so let's talk about Git and this "rule" that people often get attached to, "Don't rewrite history once it is shared!"
There is definitely some wisdom in this "rule". However, following this "rule" religiously also has some side effects. And it is worth understanding those in more depth, as well as understanding any alternatives and what their pros & cons might be.
When you follow this rule we have to ask, how do the branches we are working on
get updated with changes other developers have already integrated into main
?
Well, there are a couple of approaches we can use depending on our situation.
To explore these, let's assume we start with a Git tree that looks as follows.
* 9cabcd2 - (HEAD -> main, origin/main) commit C
* 6327c81 - commit B
* b68a192 - commit A
Now let's assume we have created some changes in a branch locally that we
have not pushed up to the remote yet. And that someone else has integrated
another change into main
.
* a0aca6f - (main, origin/main) commit G
| * fe06e90 - (HEAD -> feature/x) commit F
| * 81312b5 - commit E
| * f09acea - commit D
|/
* 9cabcd2 - commit C
* 6327c81 - commit B
* b68a192 - commit A
One option we have to include the new change in main
as part of our
feature/x
branch is to use git rebase
. This can effectively take the
commits from our branch and replay them on top of the commit that was added to
main
. Our tree would look as follows after doing this.
* ae46190 - (HEAD -> feature/x) commit F'
* 512b2ba - commit E'
* d09efea - commit D'
* a0aca6f - (main, origin/main) commit G
* 9cabcd2 - commit C
* 6327c81 - commit B
* b68a192 - commit A
This works great! It keeps our tree clean, simple, linear, and easy to
understand. However, commit D
, commit E
, and commit F
all have different
SHAs now, and are effectively commit D'
, commit E'
, and commit F'
. This
is totally fine right now because we hadn't shared commit D
, commit E
and
commit F
with anyone yet.
However, if we had shared commits D
, E
, and F
already by pushing our
branch up to the remote. We would have technically shared that history. And our
Git tree would look as follows.
* a0aca6f - (main, origin/main) commit G
| * fe06e90 - (HEAD -> feature/x, origin/feature/x) commit F
| * 81312b5 - commit E
| * f09acea - commit D
|/
* 9cabcd2 - commit C
* 6327c81 - commit B
* b68a192 - commit A
Therefore, when following this rule we can no longer use git rebase
to base
our branch on top of the latest main
because it would accomplish that by
rewriting the commits in the branch. Rewriting the commits could create
complications for other developers who happen to be working on the previous
state of the commits as the new commits have now diverged.
Instead, we need to bring the new commits in main
into our feature/x
branch
in a way that doesn't rewrite history. git merge
is exactly the command we
are looking for. As it creates a new merge commit that has multiple parent
commits. If we ran git merge main
it would look something like the following.
* f1cca4d - N - (HEAD -> feature/x) commit H
|\
| * fe06e90 - N - (origin/feature/x) commit F
| * 81312b5 - N - commit E
| * f09acea - N - commit D
* a0aca6f - N - (main, origin/main) commit G
|/
* 9cabcd2 - G - commit C
* 6327c81 - N - commit B
* b68a192 - N - commit A
In the above Git tree we can see that feature/x
contains the same commits
with the same SHAs as before, but also contains a new commit that merges main
into feature/x
.
This is great as it makes it so that you can easily collaborate with one or
more developers on the feature/x
branch. However, it also made the Git tree
more complex and complicated to understand. It might not be too bad with
one branch. But we haven't gotten to the step of merging our branch back into
main
at some point in the future. Or, the reality that often you need to
merge main into your feature branch multiple times as new changes are
introduced. Or, the reality of dealing with multiple branches doing the same
thing at the same time. Generally, you will end up with a very complicated tree
that looks something like the following.
Having these forward & backward merges happening also ends up eventually confusing Git when it comes to its ability to find merge bases correctly, in turn limiting the functionality of Git.
What is the Alternative?
The alternative is to simply use git rebase
on your feature branches.
However, we know that this can create problems for those collaborating on a
feature branch.
So we simply have an understanding of how we are going to treat feature
branches by default. For example, we can say that by default feature branches
are owned and controlled by their author. Anyone pulling down that branch and
testing it out or reviewing it should understand that the history is owned by
the author, and they will likely change it. Generally, the reviewing developer
uses something like git reset --hard origin/feature/x
to update their local
state to match that of the remote state.
Now, if you communicate out of band with the developer that owns the feature branch and let them know that you are going to push up a commit, they can integrate it into their feature branch in their local git repo and still use rebase. It just requires some out of band coordination. But this is actually a relatively rare situation for this to occur and therefore isn't a big deal.
It is also worth noting that taking this approach allows you to use the "Rebase & Merge" strategy in GitHub which will result in a linear Git history. Making it much easier to understand and manage over time.
Conclusion
I have used both these strategies and many others throughout my career. But the ease of understanding & management of a linear Git tree is far more valuable than simply not needing to worry about a tiny bit of out of band communication in the rare scenario that you need to directly contribute to someone else feature branch.
The key with this alternative is that you don't rewrite history for main
as
it is a central point of collaboration where you don't want to have deal with
the overhead of the out of bound communication and coordination. But due to the
rareness of the need to directly contribute to other peoples feature branches
the overhead of the out-of-band communication is more than an acceptable trade.