=), (< or <=)?

2018-06-10 23:23:35

Is < cheaper (faster) than <= , and similarly, is > cheaper (faster) than >= ?

Disclaimer: I know I could measure but that will be on my machine only and I am not sure if the answer could be "implementation specific" or something like that.

it varies, first start at examining different instruction sets and how how the compilers use those instruction sets. Take the openrisc 32 for example, which is clearly mips inspired but does conditionals differently. For the or32 there are compare and set flag instructions, compare these two registers if less than or equal unsigned then set the flag, compare these two registers if equal set the flag. Then there are two conditional branch instructions branch on flag set and branch on flag clear. The compiler has to follow one of these paths, but less, than, less than or equal, greater than, etc are all going to use the same number of instructions, same execution time for a conditional branch and same execution time for not doing the conditional branch.

Now it is definitely going to be true for most architectures that performing the branch takes longer than not performing the branch because of having to flush and re-fill the pipe. Some do branch prediction, etc to help with that problem.

Now some architectures the size of the instruction may vary, compare gpr0 and gpr1 vs compare gpr0 and the immediate number 1234, may require a larger instruction, you will see this a lot with x86 for example. so although both cases may be a branch if less than how you encode the less depending on what registers happen to hold what values can make a performance difference (sure x86 does a lot of pipelining, lots of caching, etc to make up for these issues). Another similar example is mips and or32, where r0 is always a zero, it is not really a general purpose register, if you write to it it doesnt change, it is hardwired to a zero, so a compare if equal to 0 MIGHT cost you more than a compare if equal to some other number if an extra instruction or two is required to fill a gpr with that immediate so that the compare can happen, worst case is having to evict a register to the stack or memory, to free up the register to put the immediate in there so that the compare can happen.

Some architectures have conditional execution like arm, for the full arm (not thumb) instructions you can on a per instruction basis execute, so if you had code

if(i==7) j=5; else j=9;

the pseudo code for arm would be

cmp i,#7
moveq j,#5
movne j,#7

there is no actual branch, so no pipeline issues you flywheel right on through, very fast.

One architecture to another if that is an interesting comparison some as mentioned, mips, or32, you have to specifically perform some sort of instruction for the comparision, others like x86, msp430 and the vast majority each alu operation changes the flags, arm and the like change flags if you tell it to change flags otherwise dont as shown above. so a

while(--len)
{
  //do something
}

loop the subtract of 1 also sets the flags, if the stuff in the loop was simple enough you could make the whole thing conditional, so you save on separate compare and branch instructions and you save in the pipeline penalty. Mips solves this a little by compare and branch are one instruction, and they execute one instruction after the branch to save a little in the pipe.

The general answer is that you will not see a difference, the number of instructions, execuition time, etc are the same for the various conditionals. special cases like small immediates vs big immediates, etc may have an effect for corner cases, or the compiler may simply choose to do it all differently depending on what comparison you do. If you try to re-write your algorithm to have it give the same answer but use a less than instead of a greater than and equal, you could be changing the code enough to get a different instruction stream. Likewise if you perform too simple of a performance test, the compiler can/will optimize out the comparison complete and just generate the results, which could vary depending on your test code causing different execution. The key to all of this is disassemble the things you want to compare and see how the instructions differ. That will tell you if you should expect to see any execution differences.

TL;DR

There appears to be little-to-no difference between the four operators, as they all perform in about the same time for me (may be different on different systems!). So, when in doubt, just use the operator that makes the most sense for the situation (especially when messing with C++).

So, without further ado, here is the long explanation:

Assuming integer comparison:

As far as assembly generated, the results are platform dependent. On my computer (Apple LLVM Compiler 4.0, x86_64), the results (generated assembly is as follows):

a < b (uses 'setl'):

movl    $10, -8(%rbp)
movl    $15, -12(%rbp)
movl    -8(%rbp), %eax
cmpl    -12(%rbp), %eax
setl    %cl
andb    $1, %cl
movzbl  %cl, %eax
popq    %rbp
ret

a <= b (uses 'setle'):

movl    $10, -8(%rbp)
movl    $15, -12(%rbp)
movl    -8(%rbp), %eax
cmpl    -12(%rbp), %eax
setle   %cl
andb    $1, %cl
movzbl  %cl, %eax
popq    %rbp
ret

a > b (uses 'setg'):

movl    $10, -8(%rbp)
movl    $15, -12(%rbp)
movl    -8(%rbp), %eax
cmpl    -12(%rbp), %eax
setg    %cl
andb    $1, %cl
movzbl  %cl, %eax
popq    %rbp
ret

a >= b (uses 'setge'): 

movl    $10, -8(%rbp)
movl    $15, -12(%rbp)
movl    -8(%rbp), %eax
cmpl    -12(%rbp), %eax
setge   %cl
andb    $1, %cl
movzbl  %cl, %eax
popq    %rbp
ret

Which isn't really telling me much. So, we skip to a benchmark:

And ladies & gentlemen, the results are in, I created the following test program (I am aware that 'clock' isn't the best way to calculate results like this, but it'll have to do for now).

#include <time.h>
#include <stdio.h>

#define ITERS 100000000

int v = 0;

void testL()
{
    clock_t start = clock();

    v = 0;

    for (int i = 0; i < ITERS; i++) {
        v = i < v;
    }

    printf("%s: %lun", __FUNCTION__, clock() - start);
}

void testLE()
{
    clock_t start = clock();

    v = 0;

    for (int i = 0; i < ITERS; i++)
    {
        v = i <= v;
    }

    printf("%s: %lun", __FUNCTION__, clock() - start);
}

void testG()
{
    clock_t start = clock();

    v = 0;

    for (int i = 0; i < ITERS; i++) {
        v = i > v;
    }

    printf("%s: %lun", __FUNCTION__, clock() - start);
}

void testGE()
{
    clock_t start = clock();

    v = 0;

    for (int i = 0; i < ITERS; i++) {
        v = i >= v;
    }

    printf("%s: %lun", __FUNCTION__, clock() - start);
}

int main()
{
    testL();
    testLE();
    testG();
    testGE();
}

Which, on my machine (compiled with -O0 ), gives me this (5 separate runs):

testL: 337848
testLE: 338237
testG: 337888
testGE: 337787

testL: 337768
testLE: 338110
testG: 337406
testGE: 337926

testL: 338958
testLE: 338948
testG: 337705
testGE: 337829

testL: 339805
testLE: 339634
testG: 337413
testGE: 337900

testL: 340490
testLE: 339030
testG: 337298
testGE: 337593

I would argue that the differences between these operators are minor at best, and don't hold much weight in a modern computing world.

链接地址: http://www.djcxy.com/p/31568.html

上一篇: C＃或任何其他语言如何不等于操作员执行

下一篇: 哪个运算符更快（>或> =），（<或<=）？