Any advantage of XOR AL,AL + MOVZX EAX, AL over XOR EAX,EAX?

2018-06-30 19:03:56

I have some unknown C++ code that was compiled in Release build, so it's optimized. The point I'm struggling with is:

xor     al, al
add     esp, 8
cmp     byte ptr [ebp+userinput], 31h
movzx   eax, al

This is my understanding:

xor     al, al    ; set eax to 0x??????00 (clear last byte)
add     esp, 8    ; for some unclear reason, set the stack pointer higher
cmp     byte ptr [ebp+userinput], 31h ; set zero flag if user input was "1"
movzx   eax, al   ; set eax to AL and extend with zeros, so eax = 0x000000??

I don't care about line 2 and 3. They might be there in this order for pipelining reasons and IMHO have nothing to do with EAX.

However, I don't understand why I would clear AL first, just to clear the rest of EAX later. The result will IMHO always be EAX = 0 , so this could also be

xor eax, eax

instead. What is the advantage or "optimization" of that piece of code?

Some background info:

I will get the source code later. It's a short C++ console demo program, maybe 20 lines of code only, so nothing that I would call "complex" code. IDA shows a single loop in that program, but not around this piece. The Stud_PE signature scan didn't find anything, but likely it's Visual Studio 2013 or 2015 compiler.

xor al,al is already slower than xor eax,eax on most CPUs. eg on Haswell/Skylake it needs an ALU uop and doesn't break the dependency on the old value of eax / rax . It's equally bad on AMD CPUs, or Atom/Silvermont. (Well, maybe not equally because AMD doesn't eliminate xor eax,eax at issue/rename, but it still has a false dependency which could serialize the new dependency chain with whatever used eax last).

On CPUs that do rename al separately from the rest of the register (Intel pre-IvyBridge), the xor al,al may still be recognized as a zeroing idiom, but unless you actively want to preserve the upper bytes of the register, the best way to zero al is xor eax,eax .

Doing movzx on top of that just makes it even worse.

I'm guessing your compiler somehow got confused and decided it needed a 1-byte zero, but then realized it needed to promote it to 32 bits. xor sets flags, so it couldn't xor -zero after the cmp , and it failed to notice that it could have just xor-zeroed eax before the cmp .

Either that or it's something like Jester's suggestion, where the movzx is a branch target. Even if that's the case, xor eax,eax would still have been better because zero-extending into eax follows unconditionally on this code path.

I'm curious what compiler produced this from what source.

链接地址: http://www.djcxy.com/p/85900.html

上一篇: 转储GCC中的寄存器值

下一篇: XOR AL，AL + MOVZX EAX，AL在XOR EAX，EAX上的优势？