accuracy approximation to `rootn(x, n)`
rootn (float_t x, int_t n)
is a function that computes the n-th root x1/n and is supported by some programming languages such as OpenCL. When IEEE-754 floating-point numbers are used, efficient low-accuracy starting approximations for any n
can be generated based on simple manipulation of the underlying bit pattern, assuming only normalized operands x
need to be processed.
The binary exponent of root (x, n)
will be 1/n of the binary exponent of x
. The exponent field of an IEEE-754 floating-point number is biased. Instead of un-biasing the exponent, dividing it, and re-biasing the result, we can simply divide the biased exponent by n
, then apply an offset to compensate for the previously neglected bias. Furthermore, instead of extracting, then dividing, the exponent field, we can simply divide the entire operand x
, re-interpreted as an integer. The required offset is trivial to find as an argument of 1 will return a result of 1 for any n
.
If we have two helper functions at our disposal, __int_as_float()
which reinterpretes an IEEE-754 binary32
as int32
, and __float_as_int()
which reinterpretes an int32
operand as binary32
, we arrive at the following low-accuracy approximation to rootn (x, n)
in straightforward fashion:
rootn (x, n) ~= __int_as_float((int)(__float_as_int(1.0f)*(1.0-1.0/n)) + __float_as_int(x)/n)
The integer division __float_as_int (x) / n
can be reduced to a shift or multiplication plus shift by well-known optimizations of integer division by constant divisor. Some worked examples are:
rootn (x, 2) ~= __int_as_float (0x1fc00000 + __float_as_int (x) / 2) // sqrt (x)
rootn (x, 3) ~= __int_as_float (0x2a555556 + __float_as_int (x) / 3) // cbrt (x)
rootn (x, -1) ~= __int_as_float (0x7f000000 - __float_as_int (x) / 1) // rcp (x)
rootn (x, -2) ~= __int_as_float (0x5f400000 - __float_as_int (x) / 2) // rsqrt (x)
rootn (x, -3) ~= __int_as_float (0x54aaaaaa - __float_as_int (x) / 3) // rcbrt (x)
With all these approximations, the result will be exact only when x
= 2n*m for integer m
. Otherwise, the approximation will provide an overestimation compared to the true mathematical result. We can approximately halve the maximum relative error by reducing the offset slightly, leading to a balanced mix of underestimation and overestimation. This is easily accomplished by a binary search for the optimal offset that uses all floating-point numbers in the interval [1, 2n) as test cases. Doing so, we find:
rootn (x, 2) ~= __int_as_float (0x1fbb4f2e + __float_as_int(x)/2) // max rel err = 3.47474e-2
rootn (x, 3) ~= __int_as_float (0x2a51067f + __float_as_int(x)/3) // max rel err = 3.15547e-2
rootn (x,-1) ~= __int_as_float (0x7ef311c2 - __float_as_int(x)/1) // max rel err = 5.05103e-2
rootn (x,-2) ~= __int_as_float (0x5f37642f - __float_as_int(x)/2) // max rel err = 3.42128e-2
rootn (x,-3) ~= __int_as_float (0x54a232a3 - __float_as_int(x)/3) // max rel err = 3.42405e-2
Some may notice that the computation for rootn (x,-2)
is basically the initial portion of Quake's fast inverse square root.
Based on observing the differences between the original raw offset, and the final offset optimized to minimize the maximum relative error, I could formulate a heuristic for the secondary correction and thus the final, optimized, offset value.
However, I am wondering whether it is possible to determine the optimal offset by some closed-form formula, such that the maximum absolute value of the relative error, max (|(approx(x,n) - x1/n) / x1/n|), is minimized for all x
in [1,2n). For ease of exposition, we can restrict to binary32
(IEEE-754 single-precision) numbers.
I am aware that in general, there is no closed-form solution for minimax approximations, however I am under the impression that closed-form solutions do exist for the case of polynomial approximations to algebraic functions like n-th root. In this case we have (piecewise) linear approximation.
Here's some Octave (MATLAB) code that computes offsets for positive n assuming the conjectures below. Seems to work for 2 and 3, but I suspect that one of the assumptions breaks down when n is too large. No time to investigate right now.
% finds the best offset for positive n
% assuming that the conjectures below are true
function oint = offset(n)
% find the worst upward error with no offset
efrac = (1 / log(2) - 1) * n;
evec = [floor(efrac) ceil(efrac)];
rootnvec = 2 .^ (evec / n);
[abserr i] = max((1 + evec / n) ./ rootnvec);
relerr = abserr - 1;
rootnx = rootnvec(i);
% conjecture: z such that approx(z, n) = 1
% should have the worst downward error
fun = @(o) relerr - o / rootnx + (1 / (1 + o * n) ^ (1 / n) - 1);
oreal = fzero(fun, [0 1]);
oint = round((127 * (1 - 1 / n) - oreal) * 2 ^ 23);
Partial answer for positive n only -- I'm just going to nibble a bit by conjecturing the worst overestimate, assuming that we don't adjust the offset downward.
Let's define an idealized version of the approximation for x in [1, 2^n)
.
rootn-A(x, n) = 1 + floor(lg(x))/n + ((x/2^floor(lg(x))) - 1) / n
^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
contribution of contribution of the
the exponent significand
We want to maximize rootn-A(x, n) / x^(1/n)
.
It seems experimentally that the maximum occurs when x
is a power of two. In this case, the significand term is zero and floor(lg(x)) = lg(x)
, so we can maximize
(1 + lg(x)/n) / x^(1/n).
Substitute y = lg(x)/n
, and we can maximize (1 + y) / 2^y
for y in [0, 1)
such that n*y
is an integer. Dropping the integrality condition, it is a calculus exercise to show that the max of this concave function is at y = 1/log(2) - 1
, about 0.4426950408889634
. It follows that the maximum for x
a power of two is at either x = 2^floor((1/log(2) - 1) * n)
or x = 2^ceil((1/log(2) - 1) * n)
. I conjecture that one of these is in fact the global maximum.
On the underestimation end, it appears that we want x
such that the output of rootn(x, n)
is 1
, at least for small n
. More to come later hopefully.
上一篇: 错误函数erff()的圆角实现
下一篇: 精确地近似`rootn(x,n)`