GCC optimizer emits strange conditional jumps, why?

2018-06-04 17:08:05

I'm looking at the disassembly for a critical function in our code. It takes about 90% of the performance'. Not entirely surprising, as it's a very big function after inlining and so it ends up doing a lot.

However, I noticed that there is some assembly I don't understand the justification for:

test rsi, rsi
je setecx0
cmp rsi, 0x1
je setecx1
cmp rsi, 0x2
je setecx2
cmp rsi, 0x3
je setecx3
cmp rsi, 0x4
je setecx4
mov ecx, 0x5
; Code that doesn't use ECX yet
ecxnotzero:
cmp r9, rsi
je epilog
ecxzero:
; Logical code below

epilog:
; Standard cleanup stuff
ret;
; 5kB more code
; 36 code fragments at the end of the function:
NOP WORD PTR[rax+rax*1+0x0] ; 16 byte alignment of the next label
setecx0:
xor ECX,ECX
jmp ecxzero
; similar functions, but with other labels
NOP WORD PTR[rax+rax*1+0x0] ; 16 byte alignment of the next label
setecx4:
mov ecx, 0x4
JMP ecxnotzero
; Similar code with other "return" addresses 
NOP WORD PTR[rax+rax*1+0x0]
setecx1:
mov ecx, 0x1
JMP ecxnotzero
; Similar code with other "return" addresses 
setecx3:
mov ecx, 0x3
JMP ecxnotzero
setecx3:
mov ecx, 0x2
JMP ecxnotzero

The reasons I'm surprised are that this seems to be a really, really complex way to set ECX to max(RSI,5) . Also, I don't understand why GCC is putting about 3 dozen of these fragments at the end of the function - those jumps can't be free, and they're certainly not local. Wouldn't a CMOV make more sense?

And this is just one of the saner cases. I also have this bit of code further on:

cmp rsi, QWORD PTR[rbp-0x70]
je fragment
vzeroupper
; Code to be skipped
skipped:
; cleanup
jmp epilog
; amongst the other fragments
nop WORD PTR[]
fragment:
vzeroupper
jmp skipped

Now I can't fathom what the point here is. Depending on the condition, we execute vzeroupper or vzeroupper (!) but in the second case, we use two jumps to skip some code. Why was such a fragment used, instead of the far more logical way :

cmp rsi, QWORD PTR[rbp-0x70]
vzeroupper ; Unconditionally
je skipped ; vzeroupper doesn't touch ZF
; Code to be skipped
skipped:

(I'm not even surprised by the fact that the epilog is near the begin of this function. The originating C++ code has a 7 way switch, each calling a different method, and it appears the epilog is placed between the first and second inlined method. The other fragments at the end of the function appear to be from those 6 other methods )

We compile with -O3 -march=haswell -funroll-loops -fPIC -mfma , using GCC 4.9.2-10 (Debian) for x64.

[edit] From the -fverbose-asm option, the GCC comment on these fragments is literally # i, .

链接地址: http://www.djcxy.com/p/15234.html

上一篇: 在shellcode NASM的JMP CALL POP技术中避免使用JMP？

下一篇: GCC优化器发出奇怪的条件跳转，为什么？