Any advantage of XOR AL,AL + MOVZX EAX, AL over XOR EAX,EAX?
I have some unknown C++ code that was compiled in Release build, so it's optimized. The point I'm struggling with is:
xor al, al
add esp, 8
cmp byte ptr [ebp+userinput], 31h
movzx eax, al
This is my understanding:
xor al, al ; set eax to 0x??????00 (clear last byte)
add esp, 8 ; for some unclear reason, set the stack pointer higher
cmp byte ptr [ebp+userinput], 31h ; set zero flag if user input was "1"
movzx eax, al ; set eax to AL and extend with zeros, so eax = 0x000000??
I don't care about line 2 and 3. They might be there in this order for pipelining reasons and IMHO have nothing to do with EAX.
However, I don't understand why I would clear AL first, just to clear the rest of EAX later. The result will IMHO always be EAX = 0
, so this could also be
xor eax, eax
instead. What is the advantage or "optimization" of that piece of code?
Some background info:
I will get the source code later. It's a short C++ console demo program, maybe 20 lines of code only, so nothing that I would call "complex" code. IDA shows a single loop in that program, but not around this piece. The Stud_PE signature scan didn't find anything, but likely it's Visual Studio 2013 or 2015 compiler.
xor al,al
is already slower than xor eax,eax
on most CPUs. eg on Haswell/Skylake it needs an ALU uop and doesn't break the dependency on the old value of eax
/ rax
. It's equally bad on AMD CPUs, or Atom/Silvermont. (Well, maybe not equally because AMD doesn't eliminate xor eax,eax
at issue/rename, but it still has a false dependency which could serialize the new dependency chain with whatever used eax
last).
On CPUs that do rename al
separately from the rest of the register (Intel pre-IvyBridge), the xor al,al
may still be recognized as a zeroing idiom, but unless you actively want to preserve the upper bytes of the register, the best way to zero al
is xor eax,eax
.
Doing movzx
on top of that just makes it even worse.
I'm guessing your compiler somehow got confused and decided it needed a 1-byte zero, but then realized it needed to promote it to 32 bits. xor
sets flags, so it couldn't xor
-zero after the cmp
, and it failed to notice that it could have just xor-zeroed eax
before the cmp
.
Either that or it's something like Jester's suggestion, where the movzx
is a branch target. Even if that's the case, xor eax,eax
would still have been better because zero-extending into eax follows unconditionally on this code path.
I'm curious what compiler produced this from what source.
链接地址: http://www.djcxy.com/p/85900.html上一篇: 转储GCC中的寄存器值