精确地近似`rootn（x，n）`

2018-06-04 15:09:50

rootn (float_t x, int_t n)是计算第n个根x1 / n的函数，并且受一些编程语言（如OpenCL）支持。当使用IEEE-754浮点数时，假设只需要处理归一化的操作数x ，则可以基于对基础位模式的简单操作来生成针对任意n有效低精度起始逼近。

root (x, n)的二进制指数将是x的二进制指数的1 / n。 IEEE-754浮点数的指数字段有偏差。我们可以简单地将偏置指数除以n ，然后应用偏移量来补偿先前忽略的偏差，而不是对指数进行偏置，然后对其进行重新偏置。此外，我们可以简单地分割整个操作数x ，而不是提取，然后分割指数字段，重新解释为一个整数。所需的偏移量是微不足道的，因为1的参数将返回任意n的结果1。

如果我们有两个辅助函数可供我们使用， __int_as_float()将IEEE-754 binary32重新解释为int32 ，而__float_as_int()将int32操作数重新解释为binary32 ，我们得到以下低精度近似值rootn (x, n)以直截了当的方式：

rootn (x, n) ~= __int_as_float((int)(__float_as_int(1.0f)*(1.0-1.0/n)) + __float_as_int(x)/n)

整数除法__float_as_int (x) / n可以通过常数除法的整数除法的众所周知的优化来减少到移位或乘法加移位。一些工作的例子是：

rootn (x,  2) ~= __int_as_float (0x1fc00000 + __float_as_int (x) / 2)  // sqrt (x)
rootn (x,  3) ~= __int_as_float (0x2a555556 + __float_as_int (x) / 3)  // cbrt (x)
rootn (x, -1) ~= __int_as_float (0x7f000000 - __float_as_int (x) / 1)  // rcp (x)
rootn (x, -2) ~= __int_as_float (0x5f400000 - __float_as_int (x) / 2)  // rsqrt (x)
rootn (x, -3) ~= __int_as_float (0x54aaaaaa - __float_as_int (x) / 3)  // rcbrt (x)

使用所有这些近似值，仅当整数m x = 2n * m时，结果才是准确的。否则，与真实的数学结果相比，近似值会提供高估。我们可以通过略微减小偏移量来近似减半最大相对误差，从而导致低估和高估的均衡混合。这很容易通过使用区间[1,2n）中的所有浮点数作为测试用例的最佳偏移的二进制搜索来实现。这样做，我们发现：

rootn (x, 2) ~= __int_as_float (0x1fbb4f2e + __float_as_int(x)/2) // max rel err = 3.47474e-2
rootn (x, 3) ~= __int_as_float (0x2a51067f + __float_as_int(x)/3) // max rel err = 3.15547e-2
rootn (x,-1) ~= __int_as_float (0x7ef311c2 - __float_as_int(x)/1) // max rel err = 5.05103e-2
rootn (x,-2) ~= __int_as_float (0x5f37642f - __float_as_int(x)/2) // max rel err = 3.42128e-2
rootn (x,-3) ~= __int_as_float (0x54a232a3 - __float_as_int(x)/3) // max rel err = 3.42405e-2

有些人可能会注意到rootn (x,-2)的计算基本上是Quake快速平方根的初始部分。

基于观察原始原始偏移量与为最小化最大相对误差而优化的最终偏移量之间的差异，我可以制定用于二次校正的启发式，并由此形成最终的优化的偏移值。

然而，我想知道是否有可能通过一些闭式公式来确定最佳偏移量，使得相对误差的最大绝对值max（|（approx（x，n） - x1 / n）/ x1 /对于[1,2n）中的所有x ，其最小化。为了便于说明，我们可以限制为binary32 （IEEE-754单精度）数字。

我知道，一般来说，对于极小极大近似不存在封闭形式的解决方案，但我的印象是，对于像第n根那样的代数函数的多项式逼近的情况，确实存在闭式解。在这种情况下，我们有（分段）线性近似。

下面是一些Octave（MATLAB）代码，假设下面的猜测计算正数n的偏移量。似乎为2和3工作，但我怀疑当n太大时，其中一个假设会失效。现在没时间调查了。

% finds the best offset for positive n
% assuming that the conjectures below are true
function oint = offset(n)
% find the worst upward error with no offset
efrac = (1 / log(2) - 1) * n;
evec = [floor(efrac) ceil(efrac)];
rootnvec = 2 .^ (evec / n);
[abserr i] = max((1 + evec / n) ./ rootnvec);
relerr = abserr - 1;
rootnx = rootnvec(i);
% conjecture: z such that approx(z, n) = 1
% should have the worst downward error
fun = @(o) relerr - o / rootnx + (1 / (1 + o * n) ^ (1 / n) - 1);
oreal = fzero(fun, [0 1]);
oint = round((127 * (1 - 1 / n) - oreal) * 2 ^ 23);

只有正数n的部分答案 - 假设我们没有调整偏移量，我只是轻描淡写地猜测最糟糕的高估。

我们定义x in [1, 2^n)的近似值的理想版本。

rootn-A(x, n) = 1 + floor(lg(x))/n + ((x/2^floor(lg(x))) - 1) / n
                    ^^^^^^^^^^^^^^   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                    contribution of       contribution of the
                     the exponent             significand

我们想要最大化rootn-A(x, n) / x^(1/n) 。

从实验上看，当x是2的幂时，发生最大值。在这种情况下，有效数项是零和floor(lg(x)) = lg(x) ，所以我们可以最大化

(1 + lg(x)/n) / x^(1/n).

代替y = lg(x)/n ，我们可以y in [0, 1) (1 + y) / 2^y最大(1 + y) / 2^y ，使得n*y是一个整数。抛开完整性条件，证明这个凹函数的最大值在y = 1/log(2) - 1 ，大约是0.4426950408889634是微积分练习。由此得出， x的幂的最大值在x = 2^floor((1/log(2) - 1) * n)或x = 2^ceil((1/log(2) - 1) * n) 。我猜想其中之一实际上是全球最大值。

在低估结束时，我们希望x使得rootn(x, n)的输出为1 ，至少对于小的n 。希望稍后再来。

链接地址: http://www.djcxy.com/p/15005.html

上一篇: accuracy approximation to `rootn(x, n)`

下一篇: optimized polynomial minimax approximation to arctangent on [