How to "undo" rebasing commits
This question already has an answer here:
There are a bunch of somewhat tricky concepts all rolled into one tightly coiled ball of hair here. Let's tease them apart, starting with the "true name" of a commit. Each commit has just one of these,1 and that is its hash ID, which is one of those big ugly 40-character things like 238e487ea943f80734cc6dad665e7238b8cbc7ff
.
1Git's eventual transition from SHA-1 to something with more bits in it may result in invalidating this: commits will, at least temporarily, have two true names, which becomes awkward in the unlikely-but-necessarily-possible event that one of these new bigger-hash commits gets a collision in its smaller SHA-1 hash. But let's not worry about that here. :-)
Hash IDs are unique
Given a hash ID, Git can find the commit (or other object) and extract its contents. Given some contents, Git can compute the hash ID. So there's a one-to-one mapping between these: a hash key represents exactly one value, and that one particular value is always represented by that same single hash key. This is what allows Git to transfer commits (and other objects) between repositories via git fetch
and git push
.
A commit's hash ID includes the author and message and a time stamp
Let's look at one of these commits:
$ git cat-file -p HEAD | sed 's/@/ /'
tree e97e9653eed972b4521e7f562e40f61f74eeb76c
parent 6e6ba65a7c8f8f9556ec42678f661794d47f7f98
author Junio C Hamano <gitster pobox.com> 1503813601 -0700
committer Junio C Hamano <gitster pobox.com> 1503813601 -0700
The fifth batch post 2.14
Signed-off-by: Junio C Hamano <gitster pobox.com>
This is the entire contents of commit 238e487ea943f80734cc6dad665e7238b8cbc7ff
, and computing an SHA-1 checksum of commit 293
(293 is the length of the text) plus the original text results in that hash:
$ python
...
>>> import hashlib
>>> import subprocess
>>> p = subprocess.Popen('git cat-file -p HEAD', stdout=subprocess.PIPE, shell=True)
>>> text = p.stdout.read()
>>> len(text)
293
>>> s = 'commit {} '.format(len(text)).encode('utf8')
>>> s += text
>>> hashlib.sha1(s).hexdigest()
'238e487ea943f80734cc6dad665e7238b8cbc7ff'
(the above should work in py2k and py3k but was patched up slightly on the fly, so might have a glitch).
Anyway, note in particular the parent
line and the author
and committer
lines. The parent
line gives the hash ID of the parent of this commit. The other two lines have a name, an email address, a long decimal number, and a weird -0700
that is actually a time zone offset (7 hours west of GMT/Zulu time, in this case). The big decimal number plus this time zone offset is the time stamp of the commit.
The tree
line gives the Git hash ID of the tree
object that contains the source that goes with this commit. The rest of the text is, obviously, just the commit message itself. Having time stamps means that two otherwise identical commits, made by the same person, using the same source tree and same commit message, will generally result in two different commits because no one makes more than one commit per second.2
2Scripts can easily violate this rule and can produce surprises.
Branch names simply point to commits, as do other commits
Since each commit has, as part of its core data, the hash ID of its parent commit, it suffices to store a single Git hash ID in a branch name like master
or develop
. This name maps to the hash ID, which identifies or "points to" the tip commit of the branch. That particular commit then has inside it the hash ID of its parent commit: the tip commit points to its parent. That parent commit points back to its own parent. It's this chain of backwards pointers, starting from branch tip commits as identified by branch names, that make up a Git branch:
A <-B <-C <-- master
Here, in this tiny 3-commit repository, the name master
identifies commit C
; C
points back to B
; and B
points back to A
. Since A
is the very first commit ever made, it points nowhere at all. The technical term for this is a root commit, and when we (or Git) work with commits, we generally follow the backwards pointers until they run out at the root.
All of this means that no commit (nor any Git object) can ever change
We're given the claim that the hash ID of any Git object—commit, tree, annotated tag, or "blob" (file)—is unique, and that it strictly depends on the data inside the object. This claim is true; Git enforces it by refusing to add a new object that, by some chance or wicked purpose, has the same hash as some existing object. In practice, changing or adding or removing just one character inside a commit produces a whole new, different hash; and even just copying a commit tends to produce a new, different hash due to the time stamps.
This makes rebase impossible, in one sense. And yet, git rebase
exists, so it must be possible somehow. The trick lies in the how.
The purpose of rebasing
There are several reasons one might use git rebase
, but the most common is simply to do just that: "re-base" some commit(s). Let's draw another graph like the minimal repository, but add a branch:
A--B--C <-- master
D--E <-- develop
The arrows inside these commits all point backwards (by definition) and ASCII makes it hard to draw in the individual arrows well, so I've left them out here. But let's continue to emphasize that the name master
points to commit C
, and the name develop
points to commit E
, because we're about to make a new commit on master
:
A--B--C--F <-- master
D--E <-- develop
Now we have a situation ripe for doing git rebase
: we might like to have commits D
and E
come after commit F
.
We've already seen, though, that we can't change anything about a commit. If we try, we get a new, different commit. But let's do that anyway: let's copy commit D
to a new, different commit D'
, whose parent is commit F
and whose message is the same as D
's:
D' <-- [temporary]
/
A--B--C--F <-- master
D--E <-- develop
To make this really work, we'll start with F
's source tree too, and make whatever changes we made earlier, to that tree. We'll do this by having Git compare commit D
to its parent commit C
:
git diff develop^ develop
then apply that set of changes to commit F
, and then make this new copy D'
using git commit
with the same message as the original D
.
There is a Git command that does this kind of copying: git cherry-pick
. If we check out commit F
by its hash ID (as a detached HEAD), and cherry-pick commit D
, we get commit D'
. What changes are the tree
and the parent
lines, and almost certainly the time stamp. But commit D'
is "just as good" as commit D
, or maybe even better, if we just also copy commit E
to E'
:
D'--E' <-- HEAD
/
A--B--C--F <-- master
D--E <-- develop
Now that we've copied the two commits we care about, we can tell Git to rip the label develop
away from commit E
and make it point, instead, to our last copy, E'
:
D'--E' <-- develop
/
A--B--C--F <-- master
D--E <-- [abandoned]
This is what git rebase
does, in general: it's an automated series of git cherry-pick
copy operations, followed by a label-move.
Choosing what to copy, to where, and other refinements
There's a very tricky bit here, disguised by the way we've been drawing these commit graphs. How does Git know which commits to copy, and where to put the copies?
The usual answer, in Git, is taken from the (single) argument to git rebase
. If we run git rebase master
, we are telling Git:
develop
) and not on master
; master
. If you look at the graph, it's obvious that the commits that are on develop
are DE
. But this is wrong! The commits that are on develop are actually ABCDE
. The commits that are on master
are ABCF
. Three of these commits, ABC
, are on both branches.
This is why the phrase above is "commits that are on the current branch, and not on the other one." Since ABC
are on both, that knocks them out of the list, leaving just DE
to be copied.
Note that our single argument, master
, is used both as "what not to copy" and "where to copy". The rebase command has a way to split these apart—"don't copy based on commit S-for-stop" and "put the copies after T-for-target"—but you still only get one "stop" point. The default is that you name both S and T with one name. The --onto
flag, git rebase --onto TS
, is what lets you split them up.
Besides just copying commits, you can use a special variety of rebase—the "interactive" one—to let you make changes just before3 it makes the new copy of an existing commit. That is, you can think of this as Copy commit D
as if via cherry-pick, but let me make some minor changes just before committing the new D'
.
3In fact, these changes are usually made using git commit --amend
, which means that you wind up making two copies: one in the new place, and then the amended copy, shoving the first copy aside, to really use. But this all happens behind the scenes and is more efficient than it sounds anyway, so it doesn't really hurt to just pretend it's "just before", at least for learning purposes.
Merges make everything trickier
Now let's look at merges. A merge commit—this is an actual thing, separate from the process by which we make the merge commit, but both are called "merge"—is any commit with at least two parent commits. We draw them by having the merge "point back" to each of its parents:
...--H--I--J---M <-- br1
/
K--L <-- br2
Here merge commit M
has two parents, J
and L
. We probably made it by doing git checkout br1; git merge br2
git checkout br1; git merge br2
. (This means that M
's first parent is J
. This does not matter right here, but it's useful later on. The first parent of any merge is the commit that was HEAD
at the time you ran git merge
. This often does not get drawn in graphs, which don't generally care about the order. Git mostly doesn't care either, except for this first-vs-second thing, and then only if you use --first-parent
.)
Let's add a few more commits beyond M
, all on br1
(which will be our current branch; let's label that too, by adding (HEAD)
):
...--H--I--J---M--N--O <-- br1 (HEAD)
/
K--L <-- br2
Now let's imagine we are trying to use git rebase
to copy, say, JMNO
.
We can tell Git to stop copying at (and before) L
. But then the copies go at the wrong place, ie, just after L
.
We can tell Git to stop copying at (and before) I
. But then Git insists on copying K
and L
.
The merge, in other words, throws a monkey wrench into the idea of using just one "stop point" unless we pick I
; and then we copy someone else' commits.
It also adds one really big monkey wrench: Git cannot copy a merge. The cherry-pick
command insists that you pick one "side" of the merge, and copies the commit into a new non-merge commit that does what that "side" did, rather than actually merging. Worse, the rebase
command, by default, simply skips merges entirely!
Here's where things get particularly tricky. Git will sometimes re-use an existing commit in place, especially doing an interactive rebase; and git rebase -p
claims to attempt to preserve merges—which it doesn't, really, because it can't. But it will re-perform a merge, ie, run git merge
again.
Hence, given the above graph, we can try running:
git rebase -i -p <hash-of-I>
Git will, we hope, re-use K
and L
in place, and maybe even re-use J
as well if we don't propose to change it at all. Of course, we do intend to change J
(by using reword
or edit
on it). So now Git will copy J
, let us tweak J'
, and then re-run the merge command to make a new merge, M'
, between J'
and L
, which we hope it re-used in place.
Git will then have to go on to copy N
and O
. The new M'
has a different hash ID than the original M
, so even if N
itself needs no other changes, its parent
line has to change. Since N
changed to become N'
, O
likewise must change to become O'
pointing back to N'
.
Whether all of this works depends on whether Git preserves the original K
and L
commits. If Git chooses to copy them, you'll become the committer (the author generally stays the same) and the time stamps will change, and hence you will copy K
and L
to K'
and L'
. The existing branch will continue to point to the originals, not to the copies.
If the copying is too complicated for Git, you can do it manually
Suppose that, for whatever reason, git rebase -i -p <hash-of-I>
does not do what we want. We undo the rebase immediately afterward using git reset --hard ORIG_HEAD
or similar, so that we are back to this graph:
...--H--I--J---M--N--O <-- br1 (HEAD)
/
K--L <-- br2
We now wish to make a new commit J'
that is like J
but different, so we can do this manually. Everything is all clean—there are no changes to worry about staging or whatever at this point—so we just run:
$ git checkout -b newbr1 <hash-of-I>
$ git cherry-pick -n <hash-of-J>
The -n
(or --no-commit
) tells Git that, yes, we're copying J
here, but don't commit the copy just yet. Now we can fiddle as much as we like with commit contents (edit files and git add
them), and then run git commit
to make the new commit and edit the commit message. (If you don't need to change the tree any, you can leave out the -n
and just edit the message.)
Now we have this:
J' <-- newbr1 (HEAD)
/
...--H--I--J---M--N--O <-- br1
/
K--L <-- br2
We're now ready to merge commit L
:
$ git merge br2
This produces commit M'
. We're now ready to cherry-pick N
:
$ git cherry-pick -n <hash-of-N>
which we can tweak as much as we like, and:
$ git cherry-pick -n br1
to copy O
(we don't need to know or find its hash, because the name br1
points to O
).
Once we're all done we just have to force the name br1
to point to the new O'
copy we made, for which we can use any of several Git commands, such as:
git branch -f br1 newbr1
as long as we're still on branch newbr1
.
上一篇: 自重组开始以来,中止了旧的git rebase并失去了承诺
下一篇: 如何“撤销”rebasing提交