Git rebase loses history, then why rebase?
I've been looking into rebasing with Git over the past couple days. Most of the arguments for rebasing say that it cleans up the history and makes it more linear. If you do plain merges (for example), you get a history that shows when the history diverged and when it was brought back together. As far as I can tell, rebasing removes all that history. Question is this: why wouldn't you want the repo history to reflect all the ways the code developed, including where and how it diverged?
As far as I can tell, rebasing removes all that history.
That's not correct. Rebasing, as the name suggests, changes the base of commits. Usually no commit is lost in that process (except that you don't get a merge commit). While your argument about keeping really everything of the development process inside the history the way it was done is correct, very often this leads to confusing histories.
Especially when working with others that each work on their own branches while requiring certain changes from others to continue (for example A asks B to implement something so that A can use that feature in his own development), this leads to many merges. For example like this:
#--#--#--#--*-----*-----------------*---#--- Branch B
/ / / /
---#-----#-----#-----#-----#-----#-----#-----#-----* Branch A
In this example we have a branch that works separately for the time but constantly pulls in changes from the original branch (# are original commits, * are merges).
Now if we do a rebase on Branch B before merging in back in, we could get the following:
#--#--#--#--#--- Branch B
/
---#---#---#---#---#---#---#---#---------------* Branch A
This represents the same actual changes, but B was rebased to some older commit on A, so all merges on B that were done before are no longer needed (because those changes are already there in that older commit). And all commits that are missing now, are the merges, which usually do not contain any information about the development process. (Note that in this example you could also rebase that last commit on A later on, to get a straight line, effectively removing any hints to the second branch)
Imagine you are working on a Secret Project of World Domination. There are three masterminds on this conspiracy:
And they all agree to come to their secret base in 1 week each one with 1 detailed plan.
The computer hacker, being a pragmatic programmer, suggested that they use Git to store all the files of the plans. Each one will fork the initial project repo and they will merge all in one week.
They all agree and in the following days the story goes like this:
The Genius
He made a total of 70 commits, 10 each day.
The General
He spy the repo of their comrades and made an strategy to beat them. He made 3 commits, all the last day.
The Computer Hacker
This pragmatic programmer used branches. He made 4 different plans, each one on a branch. Each branch was rebased to be just one commit.
Seven days passed and the group meet again to merge all the plans into one master piece. All of them were eager to start so all of them tried to merge all the stuff on his own.
Here goes the story:
The Genius
He merged all the changes from the General's repo and then the Computer Hacker's one. And then, being a logic lover, he gave a look at the log. He expected to see a logical evolution of an idea, where the things were constructed upon the previous ideas-commits.
But what the logs shown, was a myriad of commits of different ideas all mixed in the time line. One reader could not really understand the evolution, the reasoning of the commits just by reading the commits time line.
So he ended with a mess, that even a genius could't understand.
The General
The General thought: Divide and conquer!
And so he merged the repo of the Genius on his repo. He looked at the log and saw a bunch of commits from the Genius idea, which followed an understable progression, until the last day. The last day the ideas of the General and the Genius were mixed.
He was spying the The computer Hacker and knew about the Rebase solution. So he did a rebase of the his own idea and try the merge again.
Now the log showed a logical progression every day.
The Computer Hacker
This pragmatic programmer created a integration branch for the Genius idea, another one for the General idea and another one for his own ideas. He did a rebase to each branch. And then he merged all in master.
And all of his team mates saw that his log was great. It was simple. It was understable at first sight.
If an idea introduced a problem, it was clear in which commit was introduced, for there was just one.
They ended conquering all the world and they vanished the use of Subversion.
And all were happy.
You do a rebase mainly to rework your local commits (the one you haven't pushed yet) on top of a remote branch (you just fetch), in order to solve any conflict locally (ie before you push them back to the upstream repo).
See "git workflow and rebase vs merge questions" and, quite detailed: "git rebase vs git merge" .
But rebase isn't limited to that scenario, and combined with "--interactive", it allows for some local re-ordering and cleaning of your history. See also "Trimming GIT Checkins/Squashing GIT History".
why wouldn't you want the repo history to reflect all the ways the code developed, including where and how it diverged