Should I align a character array before accessing it as 32

I need to generate incompressible data into arbitrarily sized character arrays really fast. Thus, good random number generator algorithms such as Mersenne Twister cannot be used due to poor performance. I have also ruled out the C standard library random number generator functions as they are not inline functions so the call overhead is too high and besides, they are not thread-safe. I have selected the Numerical Recipes linear congruential generator (a = 1664525, c=1013904223, see http://en.wikipedia.org/wiki/Linear_congruential_generator) as the random number generator.

Now, the RNG generates 32-bit random numbers yet the array is 8-bit character array. I could do bitshifts and masks to convert one 32-bit random number into four 8-bit random numbers, but that's way too slow. Thus, I really need to access the 8-bit character array as a 32-bit integer array.

I have the following loop (or actually, I have an unrolled version of it followed by a non-unrolled version of it followed by a final loop to generate 8-bit random numbers in case sz wasn't divisible by 4):

while (off+4 <= sz)
{
    uint32_t x = randNr(&ctxlocal); // An inline function
    *(uint32_t*)(ar+off) = x;
    off += 4;
}

which accesses the 8-bit character array as a 32-bit integer array. Now, I'm concerned that the access may be unaligned. This may have two effects: (1) on non-x86/AMD64 processors, the unaligned access may fail, (2) on x86/AMD64 processors, the unaligned access may be too slow. However, I tested the program with unaligned arrays on an x86 processor and it wasn't slower than it was with aligned arrays so the potential effect (2) doesn't seem to apply. However, (1) is still true on RISC architectures. I don't have now access to any RISC machine to test how it would fail on such machines.

Should I add a loop to generate few 8-bit integers to first to make the 32-bit access always aligned? I'm concerned that the loop would reduce performance while offering no benefits for x86/AMD64 processors. We're not planning to run the software ever on non-x86/AMD64 processors.

Furthermore, the actual current use case of the function is calling it for buffers returned by malloc(), which should be anyway aligned. But somebody, someday could in theory abuse the function on RISC processors by calling it for arrays that are unaligned. The results of such abuse depending on the processor architecture may be disastrous.

It's also ok to answer if there are ways to quickly generate incompressible data that are better than the current approach of using the Numerical Recipes random number generator and accessing the 8-bit char array as a 32-bit int array. Note that the program should run on 32-bit architectures quickly too, so proposing a 64-bit random number generator doesn't count as better.


Have you heard of the PGC Family generator , the algorithm is quite simple and also very fast with a good entropy. A video of the talk describing generator.

Which is also largely better than a simple LCG.

Answer

I recently stumble upon this article as I had the same concerns of cache impact on alignment. which suggest that in specific situation access to unaligned data could have a big impact on performance.

链接地址: http://www.djcxy.com/p/54088.html

上一篇: 我理解正确吗?

下一篇: 我应该在将字符数组作为32进行访问之前对齐它们