Floating point division vs floating point multiplication
Is there any (non-microoptimization) performance gain by coding
float f1 = 200f / 2
in comparision to
float f2 = 200f * 0.5
A professor of mine told me a few years ago that floating point divisions were slower than floating point multiplications without elaborating the why.
Does this statement hold for modern PC architecture?
Update1
In respect to a comment, please do also consider this case:
float f1;
float f2 = 2
float f3 = 3;
for( i =0 ; i < 1e8; i++)
{
f1 = (i * f2 + i / f3) * 0.5; //or divide by 2.0f, respectively
}
Update 2 Quoting from the comments:
[I want] to know what are the algorithmic / architectural requirements that cause > division to be vastly more complicated in hardware than multiplication
Yes, many CPUs can perform multiplication in 1 or 2 clock cycles but division always takes longer (although FP division is sometimes faster than integer division).
If you look at this answer you will see that division can exceed 24 cycles.
Why does division take so much longer than multiplication? If you remember back to grade school, you may recall that multiplication can essentially be performed with many simultaneous additions. Division requires iterative subtraction that cannot be performed simultaneously so it takes longer. In fact, some FP units speed up division by performing a reciprocal approximation and multiplying by that. It isn't quite as accurate but is somewhat faster.
Division is inherently a much slower operation than multiplication.
And this may in fact be something that the compiler cannot (and you may not want to) optimize in many cases due to floating point inaccuracies. These two statements:
double d1 = 7 / 10.;
double d2 = 7 * 0.1;
are not semantically identical - 0.1
cannot be exactly represented as a double
, so a slightly different value will end up being used - substituting the multiplication for the division in this case would yield a different result!
Yes. Every FPU I am aware of performs multiplications much faster than divisions.
However, modern PCs are very fast. They also contain pipelining archtectures that can make the difference negligable under many circumstances. To top it off, any decent compiler will perform the division operation you showed at compile time with optimizations turned on. For your updated example, any decent compiler would perform that transformation itself.
So generally you should worry about making your code readable , and let the compiler worry about making it fast. Only if you have a measured speed issue with that line should you worry about perverting your code for the sake of speed. Compilers are well aware of what is faster than what on their CPU's, and are generally much better optimizers than you can ever hope to be.
链接地址: http://www.djcxy.com/p/85642.html上一篇: 快速向量化rsqrt和SSE / AVX取决于精度的倒数
下一篇: 浮点除法与浮点乘法