Round y=x*x to nearest

Rounding the result of division to the nearest integer is pretty simple. But I'm trying to round a result of division so that the subsequent operation will give the best approximation. That's best explained by a simple function:

const int halfbits = std::numeric_limits<unsigned int>::digits / 2;
unsigned x = foo(); // Likely big.
unsigned x_div = x >> halfbits; // Rounded down
unsigned y = x_div * x_div; // Will fit due to division.

I can round x_div to nearest by adding 1<<(halfbits-1) . But since x² is not a linear function, y in general is not rounded correctly. Is there a simple and more accurate way to calculate (x*x) >> (halfbits*2) without using bigger types?

I think that adding 3<<(halfbits-3) to x_div improves the rounding, but can't prove that's the best solution. Also, can this be generalized for xⁿ ?

Edit : by popular demand I take the liberty of "translating" the question in pure arithmetic terms (none of these C bit-shifting thing...).
Note: All divisions thereafter are integer divisons, for example 13/3 would be 4.
Problem: We can't compute x^2 because x is big, so instead we'd like to compute (x^2)/(2^N).
To do so we compute
x_div = x / sqrt(2^N)
which we then square:
y = x_div * x_div

However, this result is typically short of the exact value of (x^2)/(2^N) and the OP suggests adding 0.5 * sqrt(2^N) or maybe 0.375 * sqrt(2^N) to better approximate the result...

As Oli Charlesworth 's answer suggests there's a much better way to get to the actual value, by thinking of x^2 as (x_hi + x_lo)^2.


Whereas with x_div truncation will cause an error magnitude of at most 1, with x_div*x_div the error could be up to 1<<(half_digits+2) .

To see why, consider that we can express this squaring as follows (using long multplication):

x * x = (x_lo + x_hi) * (x_lo + x_hi)
      = x_lo^2 + x_hi^2 + 2*x_lo*x_hi

where x_lo and x_hi are the lower and upper halves of x , respectively. With some nice ASCII art, we can consider how these all line up:

 MSB     :      :      :     LSB
  +------+------+      :      :
  |    x_hi^2   |      :      :
  +-----++-----++-----+:      :
  :     | 2*x_lo*x_hi |:      :
  :     +------++-----++------+
  :      :      |    x_lo^2   |
  :      :      +------+------+
  :<----------->:      :      :
    Our result

As we can see, the lower-order terms affect multiple bits in the final result.

However, each of these terms should fit into the original type without overflow. So with appropriate shifting and masking, we can get the desired result.

Of course, the compiler/hardware does this all for you if you use a bigger type, so you should just do that if you have the option.


Use TYPE int for " y " . I guess that would solve the purpose.

链接地址: http://www.djcxy.com/p/70474.html

上一篇: 我如何设置嵌套ajax asp.net选项卡中的子选项卡高度?

下一篇: 绕y = x * x到最近