Why can't GCC optimize out `std::sqrt`?

I have the simple program:

#include <cmath>

int main()
{
    for (int i = 0; i < 50; ++i)
        std::sqrt(i);
}

Clang 3.8 optimizes it out at -O3 , but gcc 6.1 doesn't. It produces the following assembly:

## Annotations in comments added after the question was answered,
## for the benefit of future readers.
main:
    pushq   %rbx
    xorl    %ebx, %ebx
    jmp     .L2
.L4:
    pxor    %xmm0, %xmm0         # break cvtsi2sd's false dep on the old value of xmm0
    pxor    %xmm1, %xmm1         # xmm1 = 0.0
    cvtsi2sd        %ebx, %xmm0  # xmm0 = (double)i
    ucomisd %xmm0, %xmm1         # scalar double comparison, setting flags
    ja      .L7                  # if (0.0 > (double)i) sqrt(i);   // The `a` = above.  AT&T syntax reverses the order, but it's jump if  xmm1 above xmm0
.L2:
    addl    $1, %ebx             # i++
    cmpl    $50, %ebx
    jne     .L4                  # i != 50
    xorl    %eax, %eax
    popq    %rbx
    ret                          # return 0
.L7:
    call    sqrt                 # only executed on i < 0.  Otherwise gcc knows std::sqrt has no side effects.
    jmp     .L2

If I understand the as-if rule properly, the compiler is allowed to optimize out code that doesn't change the observable behavior of the program, which includes I/O writes, etc. I discard the result of std::sqrt and don't do any I/O. Furthermore, I don't have #pragma STDC FENV_ACCESS in my program. Does std::sqrt have observable side effects or is there another reason why GCC doesn't optimize out the call?


(The initial version of this question had an upper bound of 10e50 , making it an infinite loop. The same thing happens 50 , so nvm the comments about that.)


This is somewhat related to loop unrolling.

int main()
{
  for (int i = 0; i <= 16; ++i)  // CHANGED NUMBER OF ITERATIONS
    std::sqrt(i);
}

is replaced with a return 0; ( g++ -O3 -fdump-tree-all ).

If you take a look at .115t.cunroll you can see that the code is initially transformed into something like:

// ...

<bb 6>:
i_30 = i_22 + 1;
_32 = (double) i_30;
if (_32 < 0.0)
  goto <bb 7>;
else
  goto <bb 8>;

<bb 7>:
__builtin_sqrt (_32);

<bb 8>:
i_38 = i_30 + 1;
_40 = (double) i_38;
if (_40 < 0.0)
  goto <bb 9>;
else
  goto <bb 10>;

<bb 9>:
__builtin_sqrt (_40);

// ...

and the compiler, with actual numbers, can "prove" that each call to sqrt doesn't have side effects ( .125t.vrp2 ):

// ...

<bb 6>:
i_30 = 3;
_32 = 3.0e+0;
if (_32 < 0.0)
  goto <bb 7>;
else
  goto <bb 8>;

<bb 7>:
__builtin_sqrt (_32);

<bb 8>:
i_38 = 4;
_40 = 4.0e+0;
if (_40 < 0.0)
  goto <bb 9>;
else
  goto <bb 10>;

<bb 9>:
__builtin_sqrt (_40);

// ...

If the number of iterations is large, gcc:

  • doesn't perform loop unrolling (unless forced with something like --param max-completely-peeled-insns=x --param max-completely-peel-times=y )
  • isn't "smart enough" to determine that a call to sqrt(i) doesn't have side effects (but a small help is enough, eg std::sqrt(std::abs(i)) ).
  • Also gcc (v6.x) doesn't support #pragma STDC FENV_ACCESS so it must assume that this pragma is ON (otherwise the generated code can be incorrect) (the situation is more complex, see bug 34678 and Tavian Barnes' comment).


    The reason is that the standard requires to set errno in case sqrt is passed a negative number. Apparently for the value 50 (too much for unrolling) g++ "forgets" that negative values are not possible and for small values instead unrolls and discovers in each single case by constant propagation that no need of setting errno may arise.

    Also taking the absolute value before calling sqrt apparently ensures g++ that it's impossible to have a negative argument to pass.

    链接地址: http://www.djcxy.com/p/2464.html

    上一篇: 为什么英特尔编译器更喜欢NEG + ADD而不是SUB?

    下一篇: 为什么GCC不能优化`std :: sqrt`?