How to "undo" rebasing commits

2018-06-07 17:09:08

This question already has an answer here:

Undoing a git rebase 14 answers

There are a bunch of somewhat tricky concepts all rolled into one tightly coiled ball of hair here. Let's tease them apart, starting with the "true name" of a commit. Each commit has just one of these,1 and that is its hash ID, which is one of those big ugly 40-character things like 238e487ea943f80734cc6dad665e7238b8cbc7ff .

1Git's eventual transition from SHA-1 to something with more bits in it may result in invalidating this: commits will, at least temporarily, have two true names, which becomes awkward in the unlikely-but-necessarily-possible event that one of these new bigger-hash commits gets a collision in its smaller SHA-1 hash. But let's not worry about that here. :-)

Hash IDs are unique

Given a hash ID, Git can find the commit (or other object) and extract its contents. Given some contents, Git can compute the hash ID. So there's a one-to-one mapping between these: a hash key represents exactly one value, and that one particular value is always represented by that same single hash key. This is what allows Git to transfer commits (and other objects) between repositories via git fetch and git push .

A commit's hash ID includes the author and message and a time stamp

Let's look at one of these commits:

$ git cat-file -p HEAD | sed 's/@/ /'
tree e97e9653eed972b4521e7f562e40f61f74eeb76c
parent 6e6ba65a7c8f8f9556ec42678f661794d47f7f98
author Junio C Hamano <gitster pobox.com> 1503813601 -0700
committer Junio C Hamano <gitster pobox.com> 1503813601 -0700

The fifth batch post 2.14

Signed-off-by: Junio C Hamano <gitster pobox.com>

This is the entire contents of commit 238e487ea943f80734cc6dad665e7238b8cbc7ff , and computing an SHA-1 checksum of commit 293 (293 is the length of the text) plus the original text results in that hash:

$ python
...
>>> import hashlib
>>> import subprocess
>>> p = subprocess.Popen('git cat-file -p HEAD', stdout=subprocess.PIPE, shell=True)
>>> text = p.stdout.read()
>>> len(text)
293
>>> s = 'commit {}'.format(len(text)).encode('utf8')
>>> s += text
>>> hashlib.sha1(s).hexdigest()
'238e487ea943f80734cc6dad665e7238b8cbc7ff'

(the above should work in py2k and py3k but was patched up slightly on the fly, so might have a glitch).

Anyway, note in particular the parent line and the author and committer lines. The parent line gives the hash ID of the parent of this commit. The other two lines have a name, an email address, a long decimal number, and a weird -0700 that is actually a time zone offset (7 hours west of GMT/Zulu time, in this case). The big decimal number plus this time zone offset is the time stamp of the commit.

The tree line gives the Git hash ID of the tree object that contains the source that goes with this commit. The rest of the text is, obviously, just the commit message itself. Having time stamps means that two otherwise identical commits, made by the same person, using the same source tree and same commit message, will generally result in two different commits because no one makes more than one commit per second.2

2Scripts can easily violate this rule and can produce surprises.

Branch names simply point to commits, as do other commits

Since each commit has, as part of its core data, the hash ID of its parent commit, it suffices to store a single Git hash ID in a branch name like master or develop . This name maps to the hash ID, which identifies or "points to" the tip commit of the branch. That particular commit then has inside it the hash ID of its parent commit: the tip commit points to its parent. That parent commit points back to its own parent. It's this chain of backwards pointers, starting from branch tip commits as identified by branch names, that make up a Git branch:

A <-B <-C   <-- master

Here, in this tiny 3-commit repository, the name master identifies commit C ; C points back to B ; and B points back to A . Since A is the very first commit ever made, it points nowhere at all. The technical term for this is a root commit, and when we (or Git) work with commits, we generally follow the backwards pointers until they run out at the root.

All of this means that no commit (nor any Git object) can ever change

We're given the claim that the hash ID of any Git object—commit, tree, annotated tag, or "blob" (file)—is unique, and that it strictly depends on the data inside the object. This claim is true; Git enforces it by refusing to add a new object that, by some chance or wicked purpose, has the same hash as some existing object. In practice, changing or adding or removing just one character inside a commit produces a whole new, different hash; and even just copying a commit tends to produce a new, different hash due to the time stamps.

This makes rebase impossible, in one sense. And yet, git rebase exists, so it must be possible somehow. The trick lies in the how.

The purpose of rebasing

There are several reasons one might use git rebase , but the most common is simply to do just that: "re-base" some commit(s). Let's draw another graph like the minimal repository, but add a branch:

A--B--C   <-- master
       
        D--E   <-- develop

The arrows inside these commits all point backwards (by definition) and ASCII makes it hard to draw in the individual arrows well, so I've left them out here. But let's continue to emphasize that the name master points to commit C , and the name develop points to commit E , because we're about to make a new commit on master :

A--B--C--F   <-- master
       
        D--E   <-- develop

Now we have a situation ripe for doing git rebase : we might like to have commits D and E come after commit F .

We've already seen, though, that we can't change anything about a commit. If we try, we get a new, different commit. But let's do that anyway: let's copy commit D to a new, different commit D' , whose parent is commit F and whose message is the same as D 's:

           D'  <-- [temporary]
          /
A--B--C--F   <-- master
       
        D--E   <-- develop

To make this really work, we'll start with F 's source tree too, and make whatever changes we made earlier, to that tree. We'll do this by having Git compare commit D to its parent commit C :

git diff develop^ develop

then apply that set of changes to commit F , and then make this new copy D' using git commit with the same message as the original D .

There is a Git command that does this kind of copying: git cherry-pick . If we check out commit F by its hash ID (as a detached HEAD), and cherry-pick commit D , we get commit D' . What changes are the tree and the parent lines, and almost certainly the time stamp. But commit D' is "just as good" as commit D , or maybe even better, if we just also copy commit E to E' :

           D'--E'  <-- HEAD
          /
A--B--C--F   <-- master
       
        D--E   <-- develop

Now that we've copied the two commits we care about, we can tell Git to rip the label develop away from commit E and make it point, instead, to our last copy, E' :

           D'--E'  <-- develop
          /
A--B--C--F   <-- master
       
        D--E   <-- [abandoned]

This is what git rebase does, in general: it's an automated series of git cherry-pick copy operations, followed by a label-move.

Choosing what to copy, to where, and other refinements

There's a very tricky bit here, disguised by the way we've been drawing these commit graphs. How does Git know which commits to copy, and where to put the copies?

The usual answer, in Git, is taken from the (single) argument to git rebase . If we run git rebase master , we are telling Git:

copy commits that are on the current branch ( develop ) and not on master ;

copy them to the point that comes after the tip of master .

If you look at the graph, it's obvious that the commits that are on develop are DE . But this is wrong! The commits that are on develop are actually ABCDE . The commits that are on master are ABCF . Three of these commits, ABC , are on both branches.

This is why the phrase above is "commits that are on the current branch, and not on the other one." Since ABC are on both, that knocks them out of the list, leaving just DE to be copied.

Note that our single argument, master , is used both as "what not to copy" and "where to copy". The rebase command has a way to split these apart—"don't copy based on commit S-for-stop" and "put the copies after T-for-target"—but you still only get one "stop" point. The default is that you name both S and T with one name. The --onto flag, git rebase --onto TS , is what lets you split them up.

Besides just copying commits, you can use a special variety of rebase—the "interactive" one—to let you make changes just before3 it makes the new copy of an existing commit. That is, you can think of this as Copy commit D as if via cherry-pick, but let me make some minor changes just before committing the new D' .

3In fact, these changes are usually made using git commit --amend , which means that you wind up making two copies: one in the new place, and then the amended copy, shoving the first copy aside, to really use. But this all happens behind the scenes and is more efficient than it sounds anyway, so it doesn't really hurt to just pretend it's "just before", at least for learning purposes.

Merges make everything trickier

Now let's look at merges. A merge commit—this is an actual thing, separate from the process by which we make the merge commit, but both are called "merge"—is any commit with at least two parent commits. We draw them by having the merge "point back" to each of its parents:

...--H--I--J---M   <-- br1
             /
          K--L   <-- br2

Here merge commit M has two parents, J and L . We probably made it by doing git checkout br1; git merge br2 git checkout br1; git merge br2 . (This means that M 's first parent is J . This does not matter right here, but it's useful later on. The first parent of any merge is the commit that was HEAD at the time you ran git merge . This often does not get drawn in graphs, which don't generally care about the order. Git mostly doesn't care either, except for this first-vs-second thing, and then only if you use --first-parent .)

Let's add a few more commits beyond M , all on br1 (which will be our current branch; let's label that too, by adding (HEAD) ):

...--H--I--J---M--N--O   <-- br1 (HEAD)
             /
          K--L   <-- br2

Now let's imagine we are trying to use git rebase to copy, say, JMNO .

We can tell Git to stop copying at (and before) L . But then the copies go at the wrong place, ie, just after L .

We can tell Git to stop copying at (and before) I . But then Git insists on copying K and L .

The merge, in other words, throws a monkey wrench into the idea of using just one "stop point" unless we pick I ; and then we copy someone else' commits.

It also adds one really big monkey wrench: Git cannot copy a merge. The cherry-pick command insists that you pick one "side" of the merge, and copies the commit into a new non-merge commit that does what that "side" did, rather than actually merging. Worse, the rebase command, by default, simply skips merges entirely!

Here's where things get particularly tricky. Git will sometimes re-use an existing commit in place, especially doing an interactive rebase; and git rebase -p claims to attempt to preserve merges—which it doesn't, really, because it can't. But it will re-perform a merge, ie, run git merge again.

Hence, given the above graph, we can try running:

git rebase -i -p <hash-of-I>

Git will, we hope, re-use K and L in place, and maybe even re-use J as well if we don't propose to change it at all. Of course, we do intend to change J (by using reword or edit on it). So now Git will copy J , let us tweak J' , and then re-run the merge command to make a new merge, M' , between J' and L , which we hope it re-used in place.

Git will then have to go on to copy N and O . The new M' has a different hash ID than the original M , so even if N itself needs no other changes, its parent line has to change. Since N changed to become N' , O likewise must change to become O' pointing back to N' .

Whether all of this works depends on whether Git preserves the original K and L commits. If Git chooses to copy them, you'll become the committer (the author generally stays the same) and the time stamps will change, and hence you will copy K and L to K' and L' . The existing branch will continue to point to the originals, not to the copies.

If the copying is too complicated for Git, you can do it manually

Suppose that, for whatever reason, git rebase -i -p <hash-of-I> does not do what we want. We undo the rebase immediately afterward using git reset --hard ORIG_HEAD or similar, so that we are back to this graph:

...--H--I--J---M--N--O   <-- br1 (HEAD)
             /
          K--L   <-- br2

We now wish to make a new commit J' that is like J but different, so we can do this manually. Everything is all clean—there are no changes to worry about staging or whatever at this point—so we just run:

$ git checkout -b newbr1 <hash-of-I>
$ git cherry-pick -n <hash-of-J>

The -n (or --no-commit ) tells Git that, yes, we're copying J here, but don't commit the copy just yet. Now we can fiddle as much as we like with commit contents (edit files and git add them), and then run git commit to make the new commit and edit the commit message. (If you don't need to change the tree any, you can leave out the -n and just edit the message.)

Now we have this:

          J'   <-- newbr1 (HEAD)
         /
...--H--I--J---M--N--O   <-- br1
             /
          K--L   <-- br2

We're now ready to merge commit L :

$ git merge br2

This produces commit M' . We're now ready to cherry-pick N :

$ git cherry-pick -n <hash-of-N>

which we can tweak as much as we like, and:

$ git cherry-pick -n br1

to copy O (we don't need to know or find its hash, because the name br1 points to O ).

Once we're all done we just have to force the name br1 to point to the new O' copy we made, for which we can use any of several Git commands, such as:

git branch -f br1 newbr1

as long as we're still on branch newbr1 .

链接地址: http://www.djcxy.com/p/23534.html

上一篇: 自重组开始以来，中止了旧的git rebase并失去了承诺

下一篇: 如何“撤销”rebasing提交