When to rewrite Git history?

Ok, so let's talk about Git and this "rule" that people often get attached to, "Don't rewrite history once it is shared!"

There is definitely some wisdom in this "rule". However, following this "rule" religiously also has some side effects. And it is worth understanding those in more depth, as well as understanding any alternatives and what their pros & cons might be.

When you follow this rule we have to ask, how do the branches we are working on get updated with changes other developers have already integrated into main? Well, there are a couple of approaches we can use depending on our situation. To explore these, let's assume we start with a Git tree that looks as follows.

* 9cabcd2 - (HEAD -> main, origin/main) commit C
* 6327c81 - commit B
* b68a192 - commit A

Now let's assume we have created some changes in a branch locally that we have not pushed up to the remote yet. And that someone else has integrated another change into main.

* a0aca6f - (main, origin/main) commit G
| * fe06e90 - (HEAD -> feature/x) commit F
| * 81312b5 - commit E
| * f09acea - commit D
|/ 
* 9cabcd2 - commit C
* 6327c81 - commit B
* b68a192 - commit A

One option we have to include the new change in main as part of our feature/x branch is to use git rebase. This can effectively take the commits from our branch and replay them on top of the commit that was added to main. Our tree would look as follows after doing this.

* ae46190 - (HEAD -> feature/x) commit F'
* 512b2ba - commit E'
* d09efea - commit D'
* a0aca6f - (main, origin/main) commit G
* 9cabcd2 - commit C
* 6327c81 - commit B
* b68a192 - commit A

This works great! It keeps our tree clean, simple, linear, and easy to understand. However, commit D, commit E, and commit F all have different SHAs now, and are effectively commit D', commit E', and commit F'. This is totally fine right now because we hadn't shared commit D, commit E and commit F with anyone yet.

However, if we had shared commits D, E, and F already by pushing our branch up to the remote. We would have technically shared that history. And our Git tree would look as follows.

* a0aca6f - (main, origin/main) commit G
| * fe06e90 - (HEAD -> feature/x, origin/feature/x) commit F
| * 81312b5 - commit E
| * f09acea - commit D
|/ 
* 9cabcd2 - commit C
* 6327c81 - commit B
* b68a192 - commit A

Therefore, when following this rule we can no longer use git rebase to base our branch on top of the latest main because it would accomplish that by rewriting the commits in the branch. Rewriting the commits could create complications for other developers who happen to be working on the previous state of the commits as the new commits have now diverged.

Instead, we need to bring the new commits in main into our feature/x branch in a way that doesn't rewrite history. git merge is exactly the command we are looking for. As it creates a new merge commit that has multiple parent commits. If we ran git merge main it would look something like the following.

* f1cca4d - N - (HEAD -> feature/x) commit H
|\
| * fe06e90 - N - (origin/feature/x) commit F
| * 81312b5 - N - commit E
| * f09acea - N - commit D
* a0aca6f - N - (main, origin/main) commit G
|/ 
* 9cabcd2 - G - commit C
* 6327c81 - N - commit B
* b68a192 - N - commit A

In the above Git tree we can see that feature/x contains the same commits with the same SHAs as before, but also contains a new commit that merges main into feature/x.

This is great as it makes it so that you can easily collaborate with one or more developers on the feature/x branch. However, it also made the Git tree more complex and complicated to understand. It might not be too bad with one branch. But we haven't gotten to the step of merging our branch back into main at some point in the future. Or, the reality that often you need to merge main into your feature branch multiple times as new changes are introduced. Or, the reality of dealing with multiple branches doing the same thing at the same time. Generally, you will end up with a very complicated tree that looks something like the following.

image of extremely complicated git tree

Having these forward & backward merges happening also ends up eventually confusing Git when it comes to its ability to find merge bases correctly, in turn limiting the functionality of Git.

What is the Alternative?

The alternative is to simply use git rebase on your feature branches. However, we know that this can create problems for those collaborating on a feature branch.

So we simply have an understanding of how we are going to treat feature branches by default. For example, we can say that by default feature branches are owned and controlled by their author. Anyone pulling down that branch and testing it out or reviewing it should understand that the history is owned by the author, and they will likely change it. Generally, the reviewing developer uses something like git reset --hard origin/feature/x to update their local state to match that of the remote state.

Now, if you communicate out of band with the developer that owns the feature branch and let them know that you are going to push up a commit, they can integrate it into their feature branch in their local git repo and still use rebase. It just requires some out of band coordination. But this is actually a relatively rare situation for this to occur and therefore isn't a big deal.

It is also worth noting that taking this approach allows you to use the "Rebase & Merge" strategy in GitHub which will result in a linear Git history. Making it much easier to understand and manage over time.

Conclusion

I have used both these strategies and many others throughout my career. But the ease of understanding & management of a linear Git tree is far more valuable than simply not needing to worry about a tiny bit of out of band communication in the rare scenario that you need to directly contribute to someone else feature branch.

The key with this alternative is that you don't rewrite history for main as it is a central point of collaboration where you don't want to have deal with the overhead of the out of bound communication and coordination. But due to the rareness of the need to directly contribute to other peoples feature branches the overhead of the out-of-band communication is more than an acceptable trade.