When is it more efficient to use CORDIC or a polynomial approximation?

2018-06-30 16:23:10

I am working on an architecture which does not feature floating point hardware, but only a 16 bit ALU and a 40 bit MAC.

I have already implemented 32-bit single precision floating point addition/subtraction, multiplication, cosine, sine, division, square root, and range reduction all in software on this architecture.

To implement cosine and sine I first used range reduction using the method described in the paper "ARGUMENT REDUCTION FOR HUGE ARGUMENTS" by KC NG I then implemented a cosine and sine function which are polynomial approximations to the cosine and sine functions on the range -pi/4 to +pi/4. I referred to the book "Computer Approximations", Hart, et al. for the polynomials.

I have also heard that I should consider the CORDIC algorithm. However, I was wondering if anyone knows if it would be more or less efficient (in terms of throughput, memory overhead, and number of instructions required) than the method I already used? I have implemented my software functions on a multicore architecture where each core features only 128 words of instruction memory and a 128 word 16-bit data memory. Also I have tried searching for how to implement the CORDIC algorithm for cosine and sine, but I couldn't find any good resources for 32-bit floating point implementations. Does anybody have suggestions?

Thank you!

CORDIC gives you one bit per loop iteration, so implementing it in software will likely be slower than your polynomial version. That may also be why it is hard to find articles on software implementations of CORDIC: its performance is inferior, so nobody bothers.

Re your comment: Horner's method is the practice of evaluating polynomials from highest-order coefficient to lowest, by repeatedly adding the coefficient, then multiplying by the variable x . In contrast, the naive method (ie, evaluating the powers of x first, then multiplying them by their coefficients and adding them together) takes more work and can be less numerically stable than Horner's method.

You haven't mentioned exactly how you're trying to evaluate your polynomials, so I will suggest a formula:

x2 = x * x
cos = ((COS_D * x2 + COS_C) * x2 + COS_B) * x2 + COS_A
sin = (((SIN_D * x2 + SIN_C) * x2 + SIN_B) * x2 + SIN_A) * x

Note that you can get better precision if you adapt your constants to the range over which you are evaluating the function, rather than using the Taylor coefficients. (Again, apologies if you have done some or all of these things, but you didn't mention what you had already tried...)

This is probably less relevant for your case (which presumably has just a 16x16-bit MAC), but if your processor can launch multiple arithmetic evaluations at once, you may be able to get better performance if you write your evaluation in a tree-like form, avoiding some of the sequential dependency of operations:

x2 = x * x
x4 = x2 * x2
cos = (COS_D * x2 + COS_C) * x4 + (COS_B * x2 + COS_A)
sin = ((SIN_D * x2 + SIN_C) * x4 + (SIN_B * x2 + SIN_A)) * x

If your processor has a vector ALU, this formula also suggests a productive use for it...

If the MAC is significantly faster than an equivalent sequence of shifts and ands and adds, use a polynomial; don't even consider CORDIC (except possibly for a single step or two of range reduction). It's hard to find FP CORDIC algorithms precisely because that criteria always holds for any system where FP is used (in the past ~35 years), so CORDIC isn't considered.

链接地址: http://www.djcxy.com/p/85594.html

上一篇: 从log1pf（）计算asinhf（）最准确的方法是什么？

下一篇: 什么时候使用CORDIC或多项式近似更有效？