LEA or ADD instruction?
When I'm handwriting assembly, I generally choose the form
lea eax, [eax+4]
Over the form..
add eax, 4
I have heard that lea is a "0-clock" instruction (like NOP), while 'add' isn't. However, when I look at compiler produced Assembly I often see the latter form used instead of the first. I'm smart enough to trust the compiler, so can anyone shed some light on which one is better? Which one is faster? Why is the compiler choosing the latter form over the former?
One significant difference between LEA
and ADD
on x86 CPUs is the execution unit which actually performs the instruction. Modern x86 CPUs are superscalar and have multiple execution units that operate in parallel, with the pipeline feeding them somewhat like round-robin (bar stalls). Thing is, LEA
is processed by (one of) the unit(s) dealing with addressing (which happens at an early stage in the pipeline), while ADD
goes to the ALU(s) (arithmetic / logical unit), and late in the pipeline. That means a superscalar x86 CPU can concurrently execute a LEA
and an arithmetic/logical instruction.
The fact that LEA
goes through the address generation logic instead of the arithmetic units is also the reason why it used to be called "zero-clocks"; it takes no time to execute because address generation has already happened by the time it would be / is executed.
It's not free, since address generation is a step in the execution pipeline, but it's got no execution overhead. And it doesn't occupy a slot in the ALU pipeline(s).
Edit: To clarify, LEA
is not free . Even on CPUs that do not implement it via the arithmetic unit it takes time to execute due to instruction decode / dispatch / retire and/or other pipeline stages that all instructions go through. The time taken to do LEA
just occurs in a different stage of the pipeline for CPUs that implement it via address generation.
I'm smart enough to trust the compiler, so can anyone shed some light on which one is better?
Yes, a little. Firstly, I'm taking this from the following message: https://groups.google.com/group/bsdnt-devel/msg/23a48bb18571b9a6
In this message a developer optimises some assembly I wrote very badly to run crazily fast in Intel Core 2 processors. As a background to this project, it's a bsd bignum library which I and a few other developers have been involved in.
In this case, all that's being optimised is addition of two arrays that look like this: uint64_t* x, uint64_t* y
. Each "limb" or member of the array represents part of the bignum; the basic process is to iterate over it starting from the least significant limb, add the pair up and continue upwards, passing the carry (any overflow) up each time. adc
does this for you on a processor (it's not possible to access the carry flag from CI don't think).
In that piece of code, a combination of lea something, [something+1]
and jrcxz
are used, which are apparently more efficient than the jnz
/ add something, size
pair we might previously have used. I'm not sure if this was discovered as a result of simply testing different instructions, however. You'd have to ask.
However, in a later message, it is measured on an AMD chip and does not perform so well.
I'm also given to understand different operations perform differently on different processors. I know, for example, the GMP project detect processors using cpuid
and pass in different assembly routines based on different architectures, eg core2
, nehalem
.
The question you have to ask yourself is does your compiler produce optimised output for your cpu architecture? The Intel compiler, for example, is known to do this, so it might be worth measuring performance and seeing what output it produces.
LEA isn't faster than ADD instruction the execution speed is the same.
But LEA sometimes offer more than ADD. If we need simple and fast addition/multiplication in combination with second register than LEA can speed-up program execution. From the other side the LEA doesn't affect to the CPU flags so there is no overflow detection possibility.
链接地址: http://www.djcxy.com/p/12556.html上一篇: LEA指令的替代语法
下一篇: LEA或ADD指令?