Relative performance of x86 inc vs. add instruction
Quick question, assuming beforehand
mov eax, 0
which is more efficient?
inc eax
inc eax
or
add eax, 2
Also, in case the two inc
s are faster, do compilers (say, the GCC) commonly (ie w/o aggressive optimization flags) optimize var += 2
to it?
Thanks for your time!
PS: Don't bother to answer with a variation of "don't prematurely optimize", this is merely academic interest.
Two inc
instructions on the same register (or more generally speaking two read-modify-write instructions) do always have a dependency chain of at least two cycles. This is assuming a one clock latency for a inc, which is the case since the 486. That means if the surrounding instructions can't be interleaved with the two inc instructions to hide those latencies, the code will execute slower.
But no compiler will emit the instruction sequence you propose anyway ( mov eax,0
will be replaced by xor eax,eax
, see What is the purpose of XORing a register with itself?)
mov eax,0
inc eax
inc eax
it will be optimizied to
mov eax,2
If you ever wanna know raw performance stats of x86 instructions, see Dr Agner Fogs listings (volume 4 to be exact). As for the part about compilers, thats dependent on the compiler's code generator, and not something you should rely on too much.
on a side note: I find it funny/ironic that in a question about performance, you used MOV EAX,0
to zero a register instead of XOR EAX,EAX
:P (and if MOV EAX,0
was done beforehand, the fastest variant would be to remove the inc's and add's and just MOV EAX,2
).
For all purposes, it probably doesn't matter. But take into account that inc uses less bytes.
Consider the following code:
int x = 0;
x += 2;
Without using any optimization flags, GCC compiles this code into:
80483ed: c7 44 24 1c 00 00 00 movl $0x0,0x1c(%esp)
80483f4: 00
80483f5: 83 44 24 1c 02 addl $0x2,0x1c(%esp)
Using -O1
and -O2
, it becomes:
c7 44 24 08 02 00 00 movl $0x2,0x8(%esp)
Funny, isn't it?
链接地址: http://www.djcxy.com/p/85928.html上一篇: Sub指令和受影响的x86标志
下一篇: x86 inc与add指令的相对性能