Fast inverse square root on the iPhone
The fast inverse square function used by SGI/3dfx and most notably in Quake is often cited as being faster than the assembly instruction equivalent, however the posts claiming that seem quite dated. I was curious about its performance on more modern hardware, and particularly on mobile devices like the iPhone. I wouldn't be surprised if the Quake sqrt is not longer a worthwhile optimization on desktop systems, but how about for an iPhone project involving a lot of 3D math? Is it something that would be worthwhile to include?
No.
The NEON instruction set (like every other vector ISA*) has a hardware approximate reciprocal square root instruction that is much faster than that oft-cited "trick". Use it instead if reciprocal square root is actually a performance bottleneck in your code (as always, benchmark first; don't spend time optimizing something if you have no hard evidence that its performance matters).
You can get at it by writing your own assembly (inline or otherwise) with the vrsqrte.f32
instruction, or from C, Objective-C, or C++ by including the <arm_neon.h>
header and using the vrsqrte_f32( )
intrinsic.
[*] On SSE it's rsqrtss
/ rsqrtps
; on Altivec it's frsqrte
/ vrsqrte
.
上一篇: 如何propery指定用于optim()或其他优化器的渐变函数
下一篇: 快速反平方根iPhone上