Optimize sum of powers with constant double exponent
I have implemented an algorithm which at some point needs to calculate the sum of the powers of the elements of a vector. The power is a positive double which is constant during the loop. I figured out, that this calculation is currently the bottleneck of my program and wonder if there is a way to speed up the following code snippet:
double SumOfPowers(std::vector<double>& aVector,double exponent)
{
double help = 0;
size_t sizeOfaVector = aVector.size();
for (size_t k = 0; k < sizeOfaVector; k++)
{
help += std::pow(aVector[k], exponent);
}
return help;
}
I have a feeling as if one could exploit the fact that the exponent is constant during the loop and reduce the expensive std::pow calls. Does anyone know a better way of implementing this, or if there is a ready to use library function that does the job?
Firstly, check whether the loop is vectorized. For this, build your program with -O3
(here and in what follows I assume gcc compiler; I do not know much about other compilers, but I expect they have similar options). Add also -ftree-vectorizer-verbose=2
option to get detailed output on which loops were vectorized. You might want to play around with options to get the output you want.
If the loop was not vectorized, then you can make it vectorized. You might need to change the loop structure (for example, first compute all powers into a separate array, and only then calculate the sum), or use some way to tell some more info to the compiler such as restrict
declaration, see "Auto-vectorization with gcc 4.7" for more discussion. In the worst case, I think, you can implement vectorization by hand, I remember that there were such functions, see Intel's reference or "A practical guide to SSE SIMD with C++".
For Visual Studio, start with adding /Qvec-report:2
option to get verbose reporting. All the other suggestions from above do also apply, you will just need to find corresponding MSVC options.
Another way to speed the things up is to sacrifice precision by using -ffast-math
option. AFAIK, the standard pow
function uses some advanced logic to check for cases when base or exponent are really close to 1 to avoid precision problems there. If this is not your case, you might not need that logic. I suppose that -ffast-math
drops it, although you might want to check it.
Anyway, you can just replace pow
by exp(log(...)*...)
to avoid these checks manually. This will not give you much speedup, but you might notice some gain. Also, in case you frequently raise the same vector to different exponents, you can pre-calculate log
s.
No, the constant exponent does not allow for a feasible optimization unless your values are frequently repeated (if so: memoize). Parallelize is your best bet here (or do not pow
at all)
上一篇: 快速反平方根iPhone上
下一篇: 使用常数双指数优化幂的和