Why can't GCC optimize out `std::sqrt`?
I have the simple program:
#include <cmath>
int main()
{
for (int i = 0; i < 50; ++i)
std::sqrt(i);
}
Clang 3.8 optimizes it out at -O3
, but gcc 6.1 doesn't. It produces the following assembly:
## Annotations in comments added after the question was answered,
## for the benefit of future readers.
main:
pushq %rbx
xorl %ebx, %ebx
jmp .L2
.L4:
pxor %xmm0, %xmm0 # break cvtsi2sd's false dep on the old value of xmm0
pxor %xmm1, %xmm1 # xmm1 = 0.0
cvtsi2sd %ebx, %xmm0 # xmm0 = (double)i
ucomisd %xmm0, %xmm1 # scalar double comparison, setting flags
ja .L7 # if (0.0 > (double)i) sqrt(i); // The `a` = above. AT&T syntax reverses the order, but it's jump if xmm1 above xmm0
.L2:
addl $1, %ebx # i++
cmpl $50, %ebx
jne .L4 # i != 50
xorl %eax, %eax
popq %rbx
ret # return 0
.L7:
call sqrt # only executed on i < 0. Otherwise gcc knows std::sqrt has no side effects.
jmp .L2
If I understand the as-if rule properly, the compiler is allowed to optimize out code that doesn't change the observable behavior of the program, which includes I/O writes, etc. I discard the result of std::sqrt
and don't do any I/O. Furthermore, I don't have #pragma STDC FENV_ACCESS
in my program. Does std::sqrt
have observable side effects or is there another reason why GCC doesn't optimize out the call?
(The initial version of this question had an upper bound of 10e50
, making it an infinite loop. The same thing happens 50
, so nvm the comments about that.)
This is somewhat related to loop unrolling.
int main()
{
for (int i = 0; i <= 16; ++i) // CHANGED NUMBER OF ITERATIONS
std::sqrt(i);
}
is replaced with a return 0;
( g++ -O3 -fdump-tree-all
).
If you take a look at .115t.cunroll
you can see that the code is initially transformed into something like:
// ...
<bb 6>:
i_30 = i_22 + 1;
_32 = (double) i_30;
if (_32 < 0.0)
goto <bb 7>;
else
goto <bb 8>;
<bb 7>:
__builtin_sqrt (_32);
<bb 8>:
i_38 = i_30 + 1;
_40 = (double) i_38;
if (_40 < 0.0)
goto <bb 9>;
else
goto <bb 10>;
<bb 9>:
__builtin_sqrt (_40);
// ...
and the compiler, with actual numbers, can "prove" that each call to sqrt
doesn't have side effects ( .125t.vrp2
):
// ...
<bb 6>:
i_30 = 3;
_32 = 3.0e+0;
if (_32 < 0.0)
goto <bb 7>;
else
goto <bb 8>;
<bb 7>:
__builtin_sqrt (_32);
<bb 8>:
i_38 = 4;
_40 = 4.0e+0;
if (_40 < 0.0)
goto <bb 9>;
else
goto <bb 10>;
<bb 9>:
__builtin_sqrt (_40);
// ...
If the number of iterations is large, gcc:
--param max-completely-peeled-insns=x
--param max-completely-peel-times=y
) sqrt(i)
doesn't have side effects (but a small help is enough, eg std::sqrt(std::abs(i))
). Also gcc (v6.x) doesn't support #pragma STDC FENV_ACCESS
so it must assume that this pragma is ON (otherwise the generated code can be incorrect) (the situation is more complex, see bug 34678 and Tavian Barnes' comment).
The reason is that the standard requires to set errno
in case sqrt
is passed a negative number. Apparently for the value 50 (too much for unrolling) g++ "forgets" that negative values are not possible and for small values instead unrolls and discovers in each single case by constant propagation that no need of setting errno
may arise.
Also taking the absolute value before calling sqrt
apparently ensures g++ that it's impossible to have a negative argument to pass.