Floating Number Precision

2018-06-09 03:04:04

This question already has an answer here:

Is floating point math broken? 23 answers

Because floating point arithmetic != real number arithmetic. An illustration of the difference due to imprecision is, for some floats a and b , (a+b)-b != a . This applies to any language using floats.

Since floating point are binary numbers with finite precision, there's a finite amount of representable numbers, which leads accuracy problems and surprises like this. Here's another interesting read: What Every Computer Scientist Should Know About Floating-Point Arithmetic.

Back to your problem, basically there is no way to accurately represent 34.99 or 0.01 in binary (just like in decimal, 1/3 = 0.3333...), so approximations are used instead. To get around the problem, you can:

Use round($result, 2) on the result to round it to 2 decimal places.

Use integers. If that's currency, say US dollars, then store $35.00 as 3500 and $34.99 as 3499, then divide the result by 100.

It's a pity that PHP doesn't have a decimal datatype like other languages do.

Floating point numbers, like all numbers, must be stored in memory as a string of 0's and 1's. It's all bits to the computer. How floating point differs from integer is in how we interpret the 0's and 1's when we want to look at them.

One bit is the "sign" (0 = positive, 1 = negative), 8 bits are the exponent (ranging from -128 to +127), 23 bits are the number known as the "mantissa" (fraction). So the binary representation of (S1)(P8)(M23) has the value (-1^S)M*2^P

The "mantissa" takes on a special form. In normal scientific notation we display the "one's place" along with the fraction. For instance:

4.39 x 10^2 = 439

In binary the "one's place" is a single bit. Since we ignore all the left-most 0's in scientific notation (we ignore any insignificant figures) the first bit is guaranteed to be a 1

1.101 x 2^3 = 1101 = 13

Since we are guaranteed that the first bit will be a 1, we remove this bit when storing the number to save space. So the above number is stored as just 101 (for the mantissa). The leading 1 is assumed

As an example, let's take the binary string

00000010010110000000000000000000

Breaking it into it's components:

Sign    Power           Mantissa
 0     00000100   10110000000000000000000
 +        +4             1.1011
 +        +4       1 + .5 + .125 + .0625
 +        +4             1.6875

Applying our simple formula:

(-1^S)M*2^P
(-1^0)(1.6875)*2^(+4)
(1)(1.6875)*(16)
27

In other words, 00000010010110000000000000000000 is 27 in floating point (according to IEEE-754 standards).

For many numbers there is no exact binary representation, however. Much like how 1/3 = 0.333.... repeating forever, 1/100 is 0.00000010100011110101110000..... with a repeating "10100011110101110000". A 32-bit computer can't store the entire number in floating point, however. So it makes its best guess.

0.0000001010001111010111000010100011110101110000

Sign    Power           Mantissa
 +        -7     1.01000111101011100001010
 0    -00000111   01000111101011100001010
 0     11111001   01000111101011100001010
01111100101000111101011100001010

(note that negative 7 is produced using 2's complement)

It should be immediately clear that 01111100101000111101011100001010 looks nothing like 0.01

More importantly, however, this contains a truncated version of a repeating decimal. The original decimal contained a repeating "10100011110101110000". We've simplified this to 01000111101011100001010

Translating this floating point number back into decimal via our formula we get 0.0099999979 (note that this is for a 32-bit computer. A 64-bit computer would have much more accuracy)

There's plenty of answers here about why floating point numbers work the way they do...

But there's little talk of arbitrary precision (Pickle mentioned it). If you want (or need) exact precision, the only way to do it (for rational numbers at least) is to use the BC Math extension (which is really just a BigNum, Arbitrary Precision implementation...

To add two numbers:

$number = '12345678901234.1234567890';
$number2 = '1';
echo bcadd($number, $number2);

will result in 12345678901235.1234567890 ...

This is called arbitrary precision math. Basically all numbers are strings which are parsed for every operation and operations are performed on a digit by digit basis (think long division, but done by the library). So that means it's quite slow (in comparison to regular math constructs). But it's very powerful. You can multiply, add, subtract, divide, find modulo and exponentiate any number that has an exact string representation.

So you can't do 1/3 with 100% accuracy, since it has a repeating decimal (and hence isn't rational).

But, if you want to know what 1500.0015 squared is:

Using 32 bit floats (double precision) gives the estimated result of:

2250004.5000023

But bcmath gives the exact answer of:

2250004.50000225

It all depends on the precision you need.

Also, something else to note here. PHP can only represent either 32 bit or 64 bit integers (depending on your install). So if an integer exceeds the size of the native int type (2.1 billion for 32bit, 9.2 x10^18, or 9.2 billion billion for signed ints), PHP will convert the int into a float. While that's not immediately a problem (Since all ints smaller than the precision of the system's float are by definition directly representable as floats), if you try multiplying two together, it'll lose significant precision.

For example, given $n = '40000000002' :

As a number, $n will be float(40000000002) , which is fine since it's exactly represented. But if we square it, we get: float(1.60000000016E+21)

As a string (using BC math), $n will be exactly '40000000002' . And if we square it, we get: string(22) "1600000000160000000004" ...

So if you need the precision with large numbers, or rational decimal points, you might want to look into bcmath...

链接地址: http://www.djcxy.com/p/27440.html

上一篇: 浮动与javascript的总和

下一篇: 浮点数精度