GCC optimizer emits strange conditional jumps, why?
I'm looking at the disassembly for a critical function in our code. It takes about 90% of the performance'. Not entirely surprising, as it's a very big function after inlining and so it ends up doing a lot.
However, I noticed that there is some assembly I don't understand the justification for:
test rsi, rsi je setecx0 cmp rsi, 0x1 je setecx1 cmp rsi, 0x2 je setecx2 cmp rsi, 0x3 je setecx3 cmp rsi, 0x4 je setecx4 mov ecx, 0x5 ; Code that doesn't use ECX yet ecxnotzero: cmp r9, rsi je epilog ecxzero: ; Logical code below epilog: ; Standard cleanup stuff ret; ; 5kB more code ; 36 code fragments at the end of the function: NOP WORD PTR[rax+rax*1+0x0] ; 16 byte alignment of the next label setecx0: xor ECX,ECX jmp ecxzero ; similar functions, but with other labels NOP WORD PTR[rax+rax*1+0x0] ; 16 byte alignment of the next label setecx4: mov ecx, 0x4 JMP ecxnotzero ; Similar code with other "return" addresses NOP WORD PTR[rax+rax*1+0x0] setecx1: mov ecx, 0x1 JMP ecxnotzero ; Similar code with other "return" addresses setecx3: mov ecx, 0x3 JMP ecxnotzero setecx3: mov ecx, 0x2 JMP ecxnotzero
The reasons I'm surprised are that this seems to be a really, really complex way to set ECX to max(RSI,5)
. Also, I don't understand why GCC is putting about 3 dozen of these fragments at the end of the function - those jumps can't be free, and they're certainly not local. Wouldn't a CMOV make more sense?
And this is just one of the saner cases. I also have this bit of code further on:
cmp rsi, QWORD PTR[rbp-0x70] je fragment vzeroupper ; Code to be skipped skipped: ; cleanup jmp epilog ; amongst the other fragments nop WORD PTR[] fragment: vzeroupper jmp skipped
Now I can't fathom what the point here is. Depending on the condition, we execute vzeroupper
or vzeroupper
(!) but in the second case, we use two jumps to skip some code. Why was such a fragment used, instead of the far more logical way :
cmp rsi, QWORD PTR[rbp-0x70] vzeroupper ; Unconditionally je skipped ; vzeroupper doesn't touch ZF ; Code to be skipped skipped:
(I'm not even surprised by the fact that the epilog is near the begin of this function. The originating C++ code has a 7 way switch, each calling a different method, and it appears the epilog is placed between the first and second inlined method. The other fragments at the end of the function appear to be from those 6 other methods )
We compile with -O3 -march=haswell -funroll-loops -fPIC -mfma
, using GCC 4.9.2-10 (Debian) for x64.
[edit] From the -fverbose-asm
option, the GCC comment on these fragments is literally # i,
.
上一篇: 在shellcode NASM的JMP CALL POP技术中避免使用JMP?
下一篇: GCC优化器发出奇怪的条件跳转,为什么?