what is it called when a floating point number is larger than its precision?

2018-06-30 16:18:59

In single point precision there is a significand of 23 bits giving an integer range (if we where only storing a discreet integer value) up to 2^24. The exponent is 8 bits giving a range up to 2^127. At large magnitude numbers there is a point where they start to lose significant digits from the significand/mantissa.

This means a number like (2^32 + 2^8):
4,294,967,552
0x100000100
0b100000000000000000000000100000000
would be stored simply as:
exponent 0b00100000
significand / mantissa 0b00000000000000000000000 (1 impled bit)
and lose 256 from its precision.

This seems to be the opposite of so called 'sub-normal' numbers. Essentially the range of numbers being stored as an integer in the significand is much smaller than the range of numbers capable of being stored when taking into account the exponent. So once you get to 2^24 you start to lose information (possible I have misundestanding of standard)! This seems to be the opposite of what happens at the subnormal range when information is lost when there is a significand but with a smaller exponent than 2^-127

Have I missed something in my understanding of the IEEE754 standard?
If not what is this scenario called when large magnitude numbers lose precision (which seems to be the opposite of subnormal, perhaps 'supernormal')?
And to maintain precision should I limit all floating point numbers to -(10^7) < x < 10^7 ?

EDIT Updated the numbers from 100,000,010, I also added more language to explain my understanding.

EDIT 2 @Weather Vane and is correct. The point of floating point precision is that it loses accuracy on a fractional scale as soon as we start increasing magnitude, this starts affecting the integer scale when the magnitude increases the radix point past the end of the significand
0.0000000000000000000001 ->-> 10000000000000000000000.0 I can see why the exponent is so much larger than the significand for representing ultra small numbers to the largest precision possible, but for large magnitude numbers there seems to be a whole class of numbers that lose information at a greater than fractional scale once we go beyond 23 sig fig in binay. I want to know what these are called, if they even have a name eg 'super normal'?

The name for what happens when not all the digits of a real number can be represented in a floating-point format is simply “rounding”.

The case of representing integers is somewhat special, because in a typical floating-point format all small integers can be represented exactly, and in particular no integer is ever too close to zero to be represented exactly.

However, since the question alludes to subnormal numbers, it is more generally correct to think of the dual of reaching into subnormal territory being overflowing. One way to look at this is that the effective precision is 24 bits over the entire normal range of single-precision IEEE 754 numbers, and that precision tapers out when numbers get too close to zero (the subnormal range) and precision gets reduced to 0 bits all at once when overflowing (+inf and -inf).

Since the question is about representing integers (see comments), in single-precision IEEE 754 any integer x such that -224 <= x <= 224 is safe to represent (that is, [-16777216 … 16777216]). 16777217 is the smallest positive integer that cannot be represented exactly in single-precision (the nearest-even rule implies that it gets rounded down to 16777216.0).

Floating-point solves the more general problem of representing some real numbers that aren't integers, and some real numbers that are larger than the maximum integer up to which all integers are representable (here 16777216), all with a nearly uniform relative accuracy (at least 1 / 2precision).

链接地址: http://www.djcxy.com/p/85586.html

上一篇: 试图了解IEEE双精度格式

下一篇: 浮点数大于精度时称为什么？