LEA or ADD instruction?

2018-06-03 17:52:49

When I'm handwriting assembly, I generally choose the form

lea eax, [eax+4]

Over the form..

add eax, 4

I have heard that lea is a "0-clock" instruction (like NOP), while 'add' isn't. However, when I look at compiler produced Assembly I often see the latter form used instead of the first. I'm smart enough to trust the compiler, so can anyone shed some light on which one is better? Which one is faster? Why is the compiler choosing the latter form over the former?

One significant difference between LEA and ADD on x86 CPUs is the execution unit which actually performs the instruction. Modern x86 CPUs are superscalar and have multiple execution units that operate in parallel, with the pipeline feeding them somewhat like round-robin (bar stalls). Thing is, LEA is processed by (one of) the unit(s) dealing with addressing (which happens at an early stage in the pipeline), while ADD goes to the ALU(s) (arithmetic / logical unit), and late in the pipeline. That means a superscalar x86 CPU can concurrently execute a LEA and an arithmetic/logical instruction.

The fact that LEA goes through the address generation logic instead of the arithmetic units is also the reason why it used to be called "zero-clocks"; it takes no time to execute because address generation has already happened by the time it would be / is executed.

It's not free, since address generation is a step in the execution pipeline, but it's got no execution overhead. And it doesn't occupy a slot in the ALU pipeline(s).

Edit: To clarify, LEA is not free . Even on CPUs that do not implement it via the arithmetic unit it takes time to execute due to instruction decode / dispatch / retire and/or other pipeline stages that all instructions go through. The time taken to do LEA just occurs in a different stage of the pipeline for CPUs that implement it via address generation.

I'm smart enough to trust the compiler, so can anyone shed some light on which one is better?

Yes, a little. Firstly, I'm taking this from the following message: https://groups.google.com/group/bsdnt-devel/msg/23a48bb18571b9a6

In this message a developer optimises some assembly I wrote very badly to run crazily fast in Intel Core 2 processors. As a background to this project, it's a bsd bignum library which I and a few other developers have been involved in.

In this case, all that's being optimised is addition of two arrays that look like this: uint64_t* x, uint64_t* y . Each "limb" or member of the array represents part of the bignum; the basic process is to iterate over it starting from the least significant limb, add the pair up and continue upwards, passing the carry (any overflow) up each time. adc does this for you on a processor (it's not possible to access the carry flag from CI don't think).

In that piece of code, a combination of lea something, [something+1] and jrcxz are used, which are apparently more efficient than the jnz / add something, size pair we might previously have used. I'm not sure if this was discovered as a result of simply testing different instructions, however. You'd have to ask.

However, in a later message, it is measured on an AMD chip and does not perform so well.

I'm also given to understand different operations perform differently on different processors. I know, for example, the GMP project detect processors using cpuid and pass in different assembly routines based on different architectures, eg core2 , nehalem .

The question you have to ask yourself is does your compiler produce optimised output for your cpu architecture? The Intel compiler, for example, is known to do this, so it might be worth measuring performance and seeing what output it produces.

LEA isn't faster than ADD instruction the execution speed is the same.

But LEA sometimes offer more than ADD. If we need simple and fast addition/multiplication in combination with second register than LEA can speed-up program execution. From the other side the LEA doesn't affect to the CPU flags so there is no overflow detection possibility.

链接地址: http://www.djcxy.com/p/12556.html

上一篇: LEA指令的替代语法

下一篇: LEA或ADD指令？