else statement, which is faster?

I argued with a friend the other day about those two snippets. Which is faster and why ?

value = 5;
if (condition) {
    value = 6;
}

and:

if (condition) {
    value = 6;
} else {
    value = 5;
}

What if value is a matrix ?

Note: I know that value = condition ? 6 : 5; value = condition ? 6 : 5; exists and I expect it to be faster, but it wasn't an option.

Edit (requested by staff since question is on hold at the moment):

  • please answer by considering either x86 assembly generated by mainstream compilers (say g++, clang++, vc, mingw) in both optimized and non optimized versions or MIPS assembly.
  • when assembly differ, explain why a version is faster and when (eg "better because no branching and branching has following issue blahblah")

  • TL;DR: In unoptimized code, if without else seems irrelevantly more efficient but with even the most basic level of optimization enabled the code is basically rewritten to value = condition + 5 .


    I gave it a try and generated the assembly for the following code:

    int ifonly(bool condition, int value)
    {
        value = 5;
        if (condition) {
            value = 6;
        }
        return value;
    }
    
    int ifelse(bool condition, int value)
    {
        if (condition) {
            value = 6;
        } else {
            value = 5;
        }
        return value;
    }
    

    On gcc 6.3 with optimizations disabled ( -O0 ), the relevant difference is:

     mov     DWORD PTR [rbp-8], 5
     cmp     BYTE PTR [rbp-4], 0
     je      .L2
     mov     DWORD PTR [rbp-8], 6
    .L2:
     mov     eax, DWORD PTR [rbp-8]
    

    for ifonly , while ifelse has

     cmp     BYTE PTR [rbp-4], 0
     je      .L5
     mov     DWORD PTR [rbp-8], 6
     jmp     .L6
    .L5:
     mov     DWORD PTR [rbp-8], 5
    .L6:
     mov     eax, DWORD PTR [rbp-8]
    

    The latter looks slightly less efficient because it has an extra jump but both have at least two and at most three assignments so unless you really need to squeeze every last drop of performance (hint: unless you are working on a space shuttle you don't, and even then you probably don't) the difference won't be noticeable.

    However, even with the lowest optimization level ( -O1 ) both functions reduce to the same:

    test    dil, dil
    setne   al
    movzx   eax, al
    add     eax, 5
    

    which is basically the equivalent of

    return 5 + condition;
    

    assuming condition is zero or one. Higher optimization levels don't really change the output, except they manage to avoid the movzx by efficiently zeroing out the EAX register at the start.


    Disclaimer: You probably shouldn't write 5 + condition yourself (even though the standard guarantees that converting true to an integer type gives 1 ) because your intent might not be immediately obvious to people reading your code (which may include your future self). The point of this code is to show that what the compiler produces in both cases is (practically) identical. Ciprian Tomoiaga states it quite well in the comments:

    a human 's job is to write code for humans and let the compiler write code for the machine .


    The answer from CompuChip shows that for int they both are optimized to the same assembly, so it doesn't matter.

    What if value is a matrix ?

    I will interpret this in a more general way, ie what if value is of a type whose constructions and assignments are expensive (and moves are cheap).

    then

    T value = init1;
    if (condition)
       value = init2;
    

    is sub-optimal because in case condition is true, you do the unnecessary initialization to init1 and then you do the copy assignment.

    T value;
    if (condition)
       value = init2;
    else
       value = init3;
    

    This is better. But still sub-optimal if default construction is expensive and if copy construction is more expensive then initialization.

    You have the conditional operator solution which is good:

    T value = condition ? init1 : init2;
    

    Or, if you don't like the conditional operator, you can create a helper function like this:

    T create(bool condition)
    {
      if (condition)
         return {init1};
      else
         return {init2};
    }
    
    T value = create(condition);
    

    Depending on what init1 and init2 are you can also consider this:

    auto final_init = condition ? init1 : init2;
    T value = final_init;
    

    But again I must emphasize that this is relevant only when construction and assignments are really expensive for the given type. And even then, only by profiling you know for sure.


    In pseudo-assembly language,

        li    #0, r0
        test  r1
        beq   L1
        li    #1, r0
    L1:
    

    may or may not be faster than

        test  r1
        beq   L1
        li    #1, r0
        bra   L2
    L1:
        li    #0, r0
    L2:
    

    depending on how sophisticated the actual CPU is. Going from simplest to fanciest:

  • With any CPU manufactured after roughly 1990, good performance depends on the code fitting within the instruction cache. When in doubt, therefore, minimize code size. This weighs in favor of the first example.

  • With a basic "in-order, five-stage pipeline" CPU, which is still roughly what you get in many microcontrollers, there is a pipeline bubble every time a branch—conditional or unconditional—is taken, so it is also important to minimize the number of branch instructions. This also weighs in favor of the first example.

  • Somewhat more sophisticated CPUs—fancy enough to do "out-of-order execution", but not fancy enough to use the best known implementations of that concept—may incur pipeline bubbles whenever they encounter write-after-write hazards. This weighs in favor of the second example, where r0 is written only once no matter what. These CPUs are usually fancy enough to process unconditional branches in the instruction fetcher, so you aren't just trading the write-after-write penalty for a branch penalty.

    I don't know if anyone is still making this kind of CPU anymore. However, the CPUs that do use the "best known implementations" of out-of-order execution are likely to cut corners on the less frequently used instructions, so you need to be aware that this sort of thing can happen. A real example is false data dependencies on the destination registers in popcnt and lzcnt on Sandy Bridge CPUs.

  • At the highest end, the OOO engine will wind up issuing exactly the same sequence of internal operations for both code fragments—this is the hardware version of "don't worry about it, the compiler will generate the same machine code either way." However, code size still does matter, and now you also should be worrying about the predictability of the conditional branch. Branch prediction failures potentially cause a complete pipeline flush, which is catastrophic for performance; see Why is it faster to process a sorted array than an unsorted array? to understand how much difference this can make.

    If the branch is highly unpredictable, and your CPU has conditional-set or conditional-move instructions, this is the time to use them:

        li    #0, r0
        test  r1
        setne r0
    

    or

        li    #0, r0
        li    #1, r2
        test  r1
        movne r2, r0
    

    The conditional-set version is also more compact than any other alternative; if that instruction is available it is practically guaranteed to be the Right Thing for this scenario, even if the branch was predictable. The conditional-move version requires an additional scratch register, and always wastes one li instruction's worth of dispatch and execute resources; if the branch was in fact predictable, the branchy version may well be faster.

  • 链接地址: http://www.djcxy.com/p/5382.html

    上一篇: C ++比C#快多少?

    下一篇: else语句,哪个更快?