Git difftool ridiculously slow in Cygwin/MinGW
I noticed that git difftool
is very slow. An delay of about 1..2 seconds appears between each diff invocation.
To benchmark it I have written a custom difftool
command:
#!/bin/sh
echo $0 $1 $2
And configured Git to use this tool in my ~/.gitconfig
[diff]
tool = mydiff
[difftool "mydiff"]
prompt = false
cmd = "~/mydiff "$LOCAL" "$REMOTE""
I tested it on the Git sources:
$ git clone https://github.com/git/git.git
$ cd git
$ git rev-parse HEAD
1bc8feaa7cc752fe3b902ccf83ae9332e40921db
$ git diff head~10 --stat --name-only | wc -l
23
When I time a git difftool
with 259b5e6d33
, the result is ridiculously slow:
$ time git difftool 259b5
mydiff /dev/null Documentation/RelNotes/2.6.3.txt
...
mydiff /tmp/mY2T6l_upload-pack.c upload-pack.c
real 0m10.381s
user 0m1.997s
sys 0m6.667s
By trying a simpler script it goes much faster:
$ time git diff --name-only --stat 259b5 | xargs -n1 -I{} sh -c 'git show 259b5:{} > {}.tmp && ~/mydiff {} {}.tmp'
mydiff Documentation/RelNotes/2.6.3.txt Documentation/RelNotes/2.6.3.txt.tmp
mydiff upload-pack.c upload-pack.c.tmp
real 0m1.149s
user 0m0.472s
sys 0m0.821s
What did I miss?
Here the results I got
| Cygwin | Debian | Ubuntu | Method |
| ------ | ------ | ------ | -------- |
| 10.381 | 2.620 | 0.580 | difftool |
| 1.149 | 0.567 | 0.210 | custom |
For the Cygwin
results, I measured 2.8s spent in git-difftool
and 7.5s spent in git-difftool--helper
. The latter is 98 lines long. I don't understand why it is that slow.
Using some of the techniques found on the msysgit GitHub, I have narrowed this down a bit.
For each file in the diff, git-difftool--helper
re-runs the following internal commands:
12:44:46.941239 git.c:351 trace: built-in: git 'config' 'diff.tool'
12:44:47.359239 git.c:351 trace: built-in: git 'config' 'difftool.bc.cmd'
12:44:47.933239 git.c:351 trace: built-in: git 'config' '--bool' 'mergetool.prompt'
12:44:48.797239 git.c:351 trace: built-in: git 'config' '--bool' 'difftool.prompt'
12:44:49.696239 git.c:351 trace: built-in: git 'config' 'difftool.bc.cmd'
12:44:50.135239 git.c:351 trace: built-in: git 'config' 'difftool.bc.path'
12:44:50.422239 git.c:351 trace: built-in: git 'config' 'mergetool.bc.path'
12:44:51.060239 git.c:351 trace: built-in: git 'config' 'difftool.bc.cmd'
12:44:51.452239 git.c:351 trace: built-in: git 'config' 'difftool.bc.cmd'
Notice that, in this particular case, it took roughly 4.5 seconds to execute these. This is a pretty consistent pattern throughout my log.
Note too that some of these are duplicate - git config difftool.bc.cmd
is called 4 times!
Now, possible remedies:
.gitconfig
file. Seriously. It's still noticeable, but now on the order of 2 seconds instead of 4.5. .gitconfig
lives) are both excluded from realtime virus scanning. git difftool
should be slightly faster with Git 2.13 (Q2 2017)
See commit d12a8cf (14 Apr 2017) by Jeff Hostetler ( jeffhostetler
).
(Merged by Junio C Hamano -- gitster
-- in commit 8868ba1, 24 Apr 2017)
unpack-trees
: avoid duplicate ODB lookups during checkout
(ODB: Object DataBase)
Teach traverse_trees_recursive()
to not do redundant ODB lookups when both directories refer to the same OID.
In operations such as read-tree
and checkout
, there will likely be many peer directories that have the same OID when the differences between the commits are relatively small.
In these cases we can avoid hitting the ODB multiple times for the same OID.
This patch handles n=2 and n=3 cases and simply copies the data rather than repeating the fill_tree_descriptor().
================
On the Windows repo (500K trees, 3.1M files, 450MB index), this reduced the overall time by 0.75 seconds when cycling between 2 commits with a single file difference.
(avg) before: 22.699
(avg) after: 21.955
===============
After some investigation I have evidence that the bad performance had to do with files owned by a user from a different domain. Specifically, I arrived at the following conclusions:
I must assume that obtaining file permissions for users in other domains is slow, and for some reason not cached (it was always the same user).
The rest of the article below is what I originally posted. I let it stand.
For me (working in a large company with multiple, geographically distributed Windows domains) the culprit is that cygwin uses Windows acl per default. Consider this request for all known users in the Domain:
$ time (mkpasswd -D | wc -l)
45183
real 27m55,340s
user 0m1,637s
sys 0m0,123s
The fix (1)(2) was a simple matter of mounting the NTFS file systems with noacl
, ie my /etc/fstab
contains the line
none / cygdrive binary,posix=0,user,noacl 0 0
(at the same time eliminating the annoying cygdrive
prefix).
I cannot help but imagine that cygwin/msys (same behavior there, except that the Windows git installation mounts noacl
by default, probably for this reason) performs a domain server query for every file it touches and does not cache the results.
The change was introduced some time around 2015 with cygwin 2.4 or 2.5. From the release notes for 2.4:
To accommodate standard Windows ACLs, the POSIX permissions of the owner and all other users in the ACL are computed using the Windows AuthZ API. This may slow down the computation of POSIX permissions noticably in some circumstances [...] (emphasis by me).
The noacl
option reduced the time to launch BeyondCompare (or echo a string, for that matter) from 25 seconds to 1. It is completely unintelligible why a simple git diff
on the same file is very fast even with acl since I would naively assume that the required information and thus the required FS actions are identical.
I'll check out the cygserver
now which may improve things by caching.
Update: cygserver does not improve the situation, unfortunately.
(1) The fix for git. mkpasswd
is not affected.
(2) I have not understood and tested the impact on file permissions and ownership with respect to git (and ClearCase views which we also access through cygwin). My gut feeling is that one wants to stay true to Windows semantics as closely as possible (meaning that noacl
may run into problems).
(3) The cygwin documentation discusses scenarios in which the query results are not cached. One consists of a sequence of cygwin processes which are not spawned from a common cygwin ancestor (like a bash) but from a windows program like cmd
. I must assume that Windows provides a caching mechanism for native programs, or a Windows system would be unusable in this corporate environment. For some reason cygwin does not use it.