fast square root optimization?

2018-06-04 15:23:20

If you check this very nice page:

http://www.codeproject.com/Articles/69941/Best-Square-Root-Method-Algorithm-Function-Precisi

You'll see this program:

#define SQRT_MAGIC_F 0x5f3759df 
 float  sqrt2(const float x)
{
  const float xhalf = 0.5f*x;

  union // get bits for floating value
  {
    float x;
    int i;
  } u;
  u.x = x;
  u.i = SQRT_MAGIC_F - (u.i >> 1);  // gives initial guess y0
  return x*u.x*(1.5f - xhalf*u.x*u.x);// Newton step, repeating increases accuracy 
}

My question is: Is there any particular reason why this isn't implemented as:

#define SQRT_MAGIC_F 0x5f3759df 
 float  sqrt2(const float x)
{

  union // get bits for floating value
  {
    float x;
    int i;
  } u;
  u.x = x;
  u.i = SQRT_MAGIC_F - (u.i >> 1);  // gives initial guess y0

  const float xux = x*u.x;

  return xux*(1.5f - .5f*xux*u.x);// Newton step, repeating increases accuracy 
}

As, from disassembly, I see one MUL less. Is there any purpose to having xhalf appear at all?

It could be that legacy floating point math, which used 80 bit registers, was more accurate when the multipliers where linked together in the last line as intermediate results where kept in 80 bit registers.

The first multiplication in the upper implementation takes place in parallel to the integer math that follows, they use different execution resources. The second function on the other hand looks faster but it's hard to tell if it really is because of the above. Also, the const float xux = x*ux; statement reduces the result back to 32 bit float, which may reduce overall accuracy.

You could test these functions head to head and compare them to the sqrt function in math.h (use double not float). This way you can see which is faster and which is more accurate.

链接地址: http://www.djcxy.com/p/15032.html

上一篇: 这种技术叫什么？

下一篇: 快速平方根优化？