squash default on a merge?
We use separate branches for non-trivial bug fixes and features. The branchhes are kept in-sync with master by performing frequent git checkout <x>; git merge master
git checkout <x>; git merge master
.
I noticed when merging, git pollutes the log files with multiple, non-relevant messages. For example, rather than a single "Merge <X> into Master" or "Merge Master into <X>", git will add all the commit messages. Its a problem with governance (processes sand procedures) on Master because the bugs that may have been present in a branch during development are not and were not ever present in the Master branch.
Worse, the behaviors are different between branches and master. When merging master into branches, there is a log entry generated similar to "Merge Master into <X>". However, when merging a branch into Master, there is no "Merge <X> into Master". According to the logs, its as if the development branch never existed and the merge never occurred.
I learned I had to do something special to make git behave as expected; namely How to use git merge --squash? (Its classic git modus operandi: take something simple and make it difficult).
My question is, how do I make --squash
the default action during a merge?
(Note: I got here by following the link from a more recent question of yours. I'm not sure how much you still care about this, but might as well answer it.)
You can't make Git do a squash "merge" by default for all branches, but you can make it do a squash "merge" by default for some branches. Since you are particularly interested in making this happen only for master
, that may be just what you want.
Let's do a quick1 review of what git merge
really does since, in the usual Git fashion, Git complicates everything. And, this:
We use separate branches for non-trivial bug fixes and features. The branches are kept in-sync with master by performing frequent git checkout <x>; git merge master
git checkout <x>; git merge master
.
is reversed from what many people believe to be the "correct" work-flow in Git. I have some doubts as to whether any Git work-flow can be called "correct" :-) , but some are more successful than others, and this is definitely the reverse of one of the more successful ones. (I do think it can work well, as noted in the extended discussion below.)
1Well, I tried to keep it short. :-) Feel free to skim, although there's a bunch of important material here. If TL;DR, just jump straight to the end.
The commit graph
As you know, but others may not, in Git, much is controlled by the commit graph. Every1 commit has some parent commit, or in the case of a merge commit, two or more parents. To make a new commit, we get on some branch:
$ git checkout funkybranch
and do some work in the work-tree, git add
some files, and finally git commit
the result to branch funkybranch
:
... work work work ...
$ git commit -m 'do a thing'
The current commit is the (one, single) commit to which the name funkybranch
points. Git finds this by reading HEAD
: HEAD
normally contains the name of the branch, and the branch contains the raw SHA-1 hash ID of the commit.
To make the new commit, Git reads the ID of the current commit from the branch we're on, saves the index/staging-area into the repository,2 writes the new commit with the current commit's ID as the new commit's parent, and—last—writes the new commit's ID to the branch information file.
This is how a branch grows: from one commit, we make a new one, and then move the branch name to point to the new commit. When we do this as a linear chain, we get a nice linear history:
... <- C <- D <- E <-- funkybranch
Commit E
(which might actually be e35d9f...
or whatever) is the current commit. It points back to D
because D
was the current commit when we made E
; D
points back to C
because C
was current at that point; and so on.
When we make new branches with, eg, git checkout -b
, all we are doing is telling Git to make a new name, pointing to some existing commit. Usually this is just the current commit. So if we are on funkybranch
and funkybranch
points to commit E
and we run:
git checkout newbranch
then we get this:
... <- C <- D <- E <-- funkybranch, newbranch
That is, both names point to commit E
. Git knows that we're on newbranch
now because HEAD
says newbranch
. I like to include that in this kind of drawing too:
... <- C <- D <- E <-- funkybranch, HEAD -> newbranch
I also like to draw my graphs in a bit more compact fashion. We know that commits always point "backwards in time" to their parents, because it's impossible to make new commit E
before we've made commit D
. So these arrows always point leftward and we can just draw one or two dashes:
...--C--D--E <-- funkybranch, HEAD -> newbranch
(and then if we don't need to know which commit is which, we can just draw a round o
node for each one, but for now I will stick to single uppercase letters here).
If we make a new commit now—commit F
—it causes newbranch
to advance (because, as we can see from HEAD
, we're on newbranch
). So let's draw that:
...--C--D--E <-- funkybranch
F <-- HEAD -> newbranch
Now let's git checkout funkybranch
again, and do some work there and commit it, making new commit G
:
...--C--D--E--G <-- HEAD -> funkybranch
F <-- newbranch
(and HEAD
is now pointing to funkybranch
). Now we have something we can merge.
1Well, every commit except for root commits. In most Git repositories there is just one root commit, which is the very first commit. Obviously it cannot have a parent commit, since the parent of each new commit is whatever commit was current when we made the new commit. With no commits at all, there is no current commit yet when we make the first commit. So it becomes a root commit, and then all later commits are its children, grandchildren, great-grand-children, and so on.
2Most of the "save" work actually happens at each git add
. The index/staging-area contains hash IDs, rather than actual file contents: the file contents were saved away when you ran git add
. This is because Git's graph is not just of commit objects, but of every object in the repository. This is part of what makes Git so fast as compared to, eg, Mercurial (which saves the files away at commit time rather than add time). Fortunately this, unlike the commit graph itself, is something users need not know or care about.
Git merge
As before, we have to be on some branch.1 We're on funkybranch
, so we are all good to go:
$ git merge newbranch
At this point, most people seem to think that Magic Happens. It's not magic at all though. Git now finds the merge base between our current commit and the one we named, and then runs two git diff
commands.
The merge base is simply2 the first commit "in common" on the two branches—the first commit that is on both branches. We are on funkybranch
, which points to G
. We gave Git the branch name newbranch
, which points to commit F
. So we're merging commits G
and F
, and Git follows both of their parent pointers until it reaches a commit node that is on both branches. In this case, that's commit E
: commit E
is the merge base.
Now Git runs those two git diff
commands. One compares the merge base against the current commit: git diff <id-of-E> <id-of-G>
. The second diff compares the merge base against the other commit: git diff <id-of-E> <id-of-F>
.
Finally, Git attempts to combine the two sets of changes, writing the result to our current work-tree. If the changes seem independent, Git takes both of them. If they seem to collide, Git stops with a "merge conflict" and makes us clean it up. If they seem to be the same changes, Git takes just one copy of the changes.
All of this "seems" stuff is done on a purely textual basis. Git has no understanding of code. It just sees things like "delete a line reading ++x;
" and "add a line reading y *= 2;
. Those look different, so as long as they seem to be in different areas, it does the one delete and the one add, to the files in the merge-base, putting the result in the work-tree.
Last, assuming all goes well and the merge does not stop with a conflict, Git makes a new commit. The new commit is a merge commit, which means it has two3 parents. The first parent—the order matters—is the current commit, just as with regular, non-merge commits. The second parent is the other commit. Once the commit is safely written to the repository, Git writes the new commit's ID into the branch name, as usual. So, assuming the merge works, we get this:
...--C--D--E--G--H <-- HEAD -> funkybranch
/
F <-- newbranch
Note that newbranch
has not moved: it still points to commit F
. HEAD
has not changed either: it still contains the name funkybranch
. Only funkybranch
has changed: it now points to the new merge commit H
, and H
points back to G
, and also to F
.
1Git is a bit schizoid about this. If we git checkout
a raw SHA-1, or anything else that is not a branch name, it goes into a state it calls "detached HEAD". Internally, this works by shoving the SHA-1 hash directly into the HEAD
file, so that HEAD
gives the commit ID, rather than the name of the branch. But the way Git does everything else makes it work as though we're on a special branch whose name is just the empty string. It's the (single) anonymous branch—or, equivalently, it's the branch named HEAD
. So in one sense, we're always on a branch: even if Git says that we're not on any branch, Git also says that we're on the special anonymous branch.
This causes a lot of confusion, and it might be more sensible if it weren't allowed, but Git uses it internally during git rebase
, so it's actually pretty important. If the rebase goes wrong, this detail leaks out, and you wind up having to know what "detached HEAD" means, and is.
2I am deliberately ignoring a hard case here, which occurs when there are multiple possible merge base commits. Mercurial and Git use different solutions here: Mercurial picks one at (what seems to be) random, while Git gives you options. These cases are rare though, and ideally, even when they do occur, Mercurial's simpler method works anyway.
3Two or more, really: Git supports the concept of an octopus merge. But there's no need to go there. :-)
Merge changes the graph from a tree to a DAG
Merges—true merges: commits with two or more parents—have a bunch of important—critical, even—side effects. The main one is that the presence of a merge causes the commit graph data structure to change from a tree, where branches simply fork off and grow on their own, into a DAG: a Directed Acyclic Graph.
When Git walks the graph, as it does for so many operations, it usually follows all paths back. Since a merge has two parents, git log
, which walks the graph, shows both parent commits. Hence this is considered a Feature:
For example, rather than a single "Merge into Master" or "Merge Master into ", git will add all the commit messages.
Git is following, and hence logging, both the original commit sequence—commits H
, G
, E
, D
, and so on—and the merged-in commit sequence F
, E
, D
, and so on. Of course, it shows each commit only once; and by default, it sorts these commits by their date-stamps, intermingling the two branches if each one has many commits with dates that overlap.
If you don't want to see the commits that came in via the "other side" of a merge, Git has a way to do that: --first-parent
tells every Git command that walks the graph1 to follow only the first parent of each merge. The other side is still there in the graph, and it still affects how Git computes things like the merge base, but git log --first-parent
won't show it.
1This is quite a lot of Git commands. They use, or in the case of git log
itself, are, variants of git rev-list
, which is Git's general purpose graph-walk program. This code is central to push, fetch, bisect, log, blame, rebase, and numerous others. Its documentation has a dizzying array of options. The key ones to know as a casual user are --first-parent
(just discussed here); --no-walk
(suppresses graph walking entirely); --ancestry-path
(simplifies history for source tree related work); --simplify-by-decoration
(simplifies history for git log
output); --branches
, --remotes
, and --tags
(selects starting points for graph walking by branch, remote, or tag name); --merges
and --no-merges
(include or exclude merge commits); --since
and --until
(limit commits by date ranges); and the basic ..
and ...
(two and three dot) graph subsetting operations.
Benefits of merges
Having the merge in place means that development on a branch can continue on that branch, and a later git merge
finds a newer—and hence less complicated—merge base. Consider this graph, where only a few commits have single-letter names:
o--o--o--o--H--o--o--I <-- feature2
/
A--o--B---C-----D--E-----F--G <-- master
/ / /
o--o--J--o--o--K--o--o--L <-- feature1
Here, except for two early commits done on master
after the root commit A
, all development has taken place on side branches feature1
and feature2
. Commits C
, D
, E
, F
, and G
are all merges (in this case, strictly into master
), bringing the feature-work into master
when it was ready.
Note that when we made commit C
on master
, we did:
$ git checkout master; git merge feature1
which found A
as the merge base and B
and J
as the two tip commits to merge. When we made D
:
$ git checkout master; git merge feature2
we had A
as the merge base and C
and H
as the two tip commits. So far, this is nothing special. But when we made E
, we had this much so far (the final o
s, and even I
, on feature2
may or may not have been in place—they have no effect):
o--o--o--o--H--o--o <-- feature2
/
A--o--B---C-----D <-- master
/
o--o--J--o--o--K <-- feature1
The merge base of master
and feature1
is the first commit that is on both branches, which is commit J
, which is the one we merged in to make C
. So to do this merge, Git compares J
vs D
—the code we brought in from feature2
—and J
vs K
: the new code (and only the new code) on feature1
. If all goes well, or once we fix merge conflicts, this makes commit E
and we now have:
o--o--o--o--H--o--o--I <-- feature2
/
A--o--B---C-----D--E <-- master
/ /
o--o--J--o--o--K--o--o <-- feature1
when we go to merge feature2
again. This time the merge base is commit H
: moving straight back from feature2
soon hits H
, and moving from E
to D
and then up to H
from master
also hits H
. So now Git compares H
vs E
, which is what we brought in from feature1
, and H
vs I
, which is the new stuff we added to feature2
, and merges just those.
Drawbacks of merges
Trees have some very nice graph-theoretic properties, such as a guarantee of a single simple merge-base. Arbitrary DAGs may lose these properties. In particular, doing merges both ways—merging master
into branch
and merging branch
into master
—results in "criss cross merges" that can give you multiple merge bases.
Merges also make the graph ( git log
) very hard to follow. Using --first-parent
or --simplify-by-decoration
helps, especially if you practice good merging, but these graphs just naturally get messy.
Squash merges
Squash merges avoid the problems, but do so by paying a fairly heavy price: they are not merges at all. (Soon, we'll see how to deal with this.)
When you run git merge --squash
, Git goes through the same motions as before in terms of finding a merge base, and making two diffs: merge-base vs current-commit, and merge-base vs other-commit. It then combines the changes in exactly the same way as for a regular commit. But then it makes an ordinary commit.1 The new commit has just a single parent, taken from the current branch.
Let's see that in action for the same sequence with feature1
and feature2
:
o--o--o <-- feature2
/
A--o--B <-- master
o--o--J <-- feature1
We do git checkout master; git merge --squash feature1
git checkout master; git merge --squash feature1
to make new commit C
. Git compares A
vs B
to see what we did on master
, and A
vs J
to see what they (we) did on feature1
. Git combines those changes and we get commit C
, but with only one parent:
o--o--o <-- feature2
/
A--o--B---C <-- master
o--o--J <-- feature1
Now we'll make D
as a squash from feature2
:
o--o--o--o--H <-- feature2
/
A--o--B---C <-- master
o--o--J--o--o <-- feature1
Git compares A
vs C
, and A
vs H
, same as last time. We now get D
. So far it's much the same, except that there are no points where the branches rejoin. But now it is time to make E
:
o--o--o--o--H--o--o <-- feature2
/
A--o--B---C-----D <-- master
o--o--J--o--o--K <-- feature1
We run git checkout master; git merge --squash feature1
git checkout master; git merge --squash feature1
as before.
Last time, Git compared J
-vs- D
and J
-vs- K
, as commit J
was our merge base.
This time, commit A
is (still) our merge base. Git compares A
vs D
, and A
vs K
. If there were conflicts we solved at C
last time, we probably have to solve them again. This is bad—but we're not lost yet.
1Ordinary, as opposed to merge. As such, a squash merge is not a merge at all: it's a "get me the work done" commit, but it's not a merge commit. We need a real merge commit in addition; we will get to this in the next section.
Git actually stops here and forces you to run git commit
to make the squash commit. Why? Who knows, it's Git. :-)
Squash merges can work
To solve the above, we just need to re-merge (with a non-squash "real merge") from master
back to the feature
branches. That is, instead of simply merging from whichever feature branch into master
, and then continuing to work on the feature branch, we do this:
o--o--o--o--H--*-o--o <-- feature2
/ /
A--o--B---C----D <-- master
o--o--J---*--o--o--K <-- feature1
These new commits, marked *
, are (non-squash) merges from master, into feature1
and feature2
. We made squash merge C
to pick up changes made from A
to J
. So we then make a real merge into feature1
, preferably using the tree straight from master
1 (which has whatever goodies were in o--B--
as well). (We also made the *
on feature2
, just as general preparation, after making D
on master
to bring in everything from A
to H
. Like the *
on feature1
we probably just want the source tree straight from master
.)
Now that we're ready to bring in more work from feature1
, we can just do another (squash) merge. The merge-base of master
and feature1
is commit C
, and the two tips are D
and K
, which is just what we want. Git's merge code will come up with a reasonably close result; we fix up any conflicts, test, fix any breakage, and commit; and then we do another "prep work" merge from master
back into feature1
as before.
This work-flow is a bit more complicated than the "merge into master" one, but should give good results.
1Git does not make this totally trivial: we want a merge with a -s theirs
strategy, which Git simply doesn't have. There is an easy way to get the desired effect using "plumbing" commands, but I'll leave that out of this answer, which is already crazy-long.
So, if that all works, how about the mechanics?
Note that what we want is merge --squash
when merging into master, but regular (non-squash) merge when merging from master. In other words:
$ git checkout master && git merge foo
should use --squash
, but:
$ git checkout foo && git merge master
should not use --squash
. (The tree copying from the footnote in the previous section might be nice, but should be unnecessary: the merge result should basically always be the tree straight out of master
.)
When git merge
runs, it looks at the current branch (as it always must). If that branch has a name—if we're not in "detached HEAD" mode—Git then looks at your configuration, for a value stored under branch.branch.mergeOptions
. Any string value here is scanned as if it were part of the git merge
command.
Hence:
$ git config branch.master.mergeOptions "--squash"
(the quotes are not technically required, and you can add --global
after git config
, before branch.master.mergeOptions
) sets up your current repository to do squash-merges into master
. (With --global
, it sets this as your personal default for all repositories. Any branch.master.mergeOptions
set in a particular repository will override these global ones, though.)
上一篇: 如何清理我的git存储库
下一篇: 压缩默认的合并?