Range of representable values of 32

In the C++ standard it says of floating literals:

If the scaled value is not in the range of representable values for its type, the program is ill-formed.

The scaled value is the significant part multiplied by 10 ^ exponent part.

Under x86-64:

  • float is a single-precision IEEE-754
  • double is a double-precision IEEE-754
  • long double is an 80-bit extended precision IEEE-754
  • In this context, what is the range of repsentable values for each of these three types? Where is this documented? or how is it calculated?


    The answer (if you're on a machine with IEEE floating point) is in float.h . FLT_MAX , DBL_MAX and LDBL_MAX . On a system with full IEEE support, something around 3.4e+38, 1.8E+308 and 1.2E4932. (The exact values may vary, and may be expressed differently, depending on how the compiler does its input and rounding. g++, for example, defines them to be compiler built-ins.)

    EDIT:

    WRT your question (since neither I nor the other responders actually answered it): the range of representable values is [-type_MAX...type] , where type is one of FLT , DBL , or LDBL .


    If you know the number of exponent bits and mantissa bits, then based on the IEEE-754 format, one can establish that the maximum absolute representable value is:

    2^(2^(E-1)-1)) * (1 + (2^M-1)/2^M)
    

    The minimum absolute value (not including zero or denormals) is:

    2^(2-2^(E-1))
    
  • For single-precision, E is 8, M is 23.
  • For double-precision, E is 11, M is 52.
  • For extended-precision, I'm not sure. If you're referring to the 80-bit precision of the x87 FPU, then so far as I can tell, it's not's IEEE-754 compliant...

  • I was looking for largest representable number by 64 bits and ending up making my own 500 digit floating point calculator. This is what I come up with if all 64 bits are turned on

    18,446,744,073,709,551,615

    18 quintillion 446 quadrillion 744 trillion 73 billion 709 million 551 thousand 615

    链接地址: http://www.djcxy.com/p/85576.html

    上一篇: 双倍的最大和最小指数

    下一篇: 可表示值的范围为32