Rounding error in floating

There are several rounding modes in the IEEE-754 floating-point arithmetic:

  • Round to nearest: RN(x) is the floating-point number that is the closest to x.
  • Round down: RD(x) is the largest floating-point number less than or equal to x.
  • Round up: RD(x) is the smallest floating-point number greater than or equal to x.
  • Round toward zero: RZ(x) is the closest floating-point number to x that is no greater in magnitude than x,
  • If a large absolute rounding error (close to the theoretical bound) is obtained when performing some computations with rounding up, does this mean that the error will be small if the same computations are performed with rounding down?

    I would like to clarify my question:

    Suppose we need to approximate the value of x using interval arithmetic with the floating-point bounds, ie, compute the numbers a and b such that a <= x <= b.

    Let, for instance, x = x1+x2+...+xn, where x1,x2,…,xn are finite positive floating-point numbers.

  • First, a is computed with rounding down: a=RD(x1+x2+...+xn).
  • Then, b is computed with rounding up: b=RU(x1+x2+....+xn).
  • Next, suppose we know that

    x - a <= EPS,

    and also that

    b - x <= EPS,

    where x is the exact sum.

    Which upper bound is valid for the length of the [a, b] interval: ba <= EPS or ba <= 2 EPS?


    Yes.

    Suppose the exact mathematical result x falls between two finite representable values, a and b, with a < b. The least upper bound on the error is b−a. Let e be the error when rounding up (and hence e is b−x), and let it be nearly b−a. Then the error when rounding down is b−a−e, so it is small relative to b−a.

    If a and b are not both finite, then either:

  • b is +∞, and the error when rounding up is infinite while the error when rounding down is finite and hence comparatively small, or
  • a is −∞, and the error when rounding up is finite while the error when rounding down is infinite.
  • In the last case, the error when rounding up cannot large in the sense you define, since it must be finite and hence cannot be close to the theoretical bound in this case, which is ∞. So no results that meet your prerequisite lie in this interval.

    链接地址: http://www.djcxy.com/p/15014.html

    上一篇: 为什么LLVM不通过优化浮点指令?

    下一篇: 浮动时舍入错误