How many decimal places does the primitive float and double support?

2018-06-09 03:15:24

This question already has an answer here:

'float' vs. 'double' precision 6 answers

Those are the total number of "significant figures" if you will, counting from left to right, regardless of where the decimal point is. Beyond those numbers of digits, accuracy is not preserved.

The counts you listed are for the base 10 representation.

There are macros for the number of decimal places each type supports. The gcc docs explain what they are and also what they mean:

FLT_DIG

This is the number of decimal digits of precision for the float data type. Technically, if p and b are the precision and base (respectively) for the representation, then the decimal precision q is the maximum number of decimal digits such that any floating point number with q base 10 digits can be rounded to a floating point number with p base b digits and back again, without change to the q decimal digits.

The value of this macro is supposed to be at least 6 , to satisfy ISO C.

DBL_DIG
LDBL_DIG

These are similar to FLT_DIG, but for the data types double and long double, respectively. The values of these macros are supposed to be at least 10 .

On both gcc 4.9.2 and clang 3.5.0, these macros yield 6 and 15, respectively.

are these numbers the number of decimal places supported or total number of digits in a number?

They are the significant digits contained in every number (although you may not need all of them, but they're still there). The mantissa of the same type always contains the same number of bits, so every number consequentially contains the same number of valid "digits" if you think in terms of decimal digits. You cannot store more digits than will fit into the mantissa.

The number of "supported" digits is, however, much larger, for example float will usually support up to 38 decimal digits and double will support up to 308 decimal digits, but most of these digits are not significant (that is, "unknown").

Although technically, this is wrong, since float and double do not have universally well-defined sizes like I presumed above (they're implementation-defined). Also, storage sizes are not necessarily the same as the sizes of intermediate results.

The C++ standard is very reluctant at precisely defining any fundamental type, leaving almost everything to the implementation. Floating point types are no exception:

3.9.1 / 8
There are three floating point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double. The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double. The value representation of floating-point types is implementation-defined.

Now of course all of this is not particularly helpful in practice.

In practice, floating point is (usually) IEEE 754 compliant, with float having a width of 32 bits and double having a width of 64 bits (as stored in memory, registers have higher precision on some notable mainstream architectures).

This is equivalent to 24 bits and 53 bits of matissa, respectively, or 7 and 15 full decimals .

链接地址: http://www.djcxy.com/p/27462.html

上一篇: 为什么模数运算符在javascript中返回小数？

下一篇: 原始浮点数和双精度浮点数有多少个小数位？