bit multiplication through 16

2018-06-25 22:56:01

I am writing a soft-multiplication function call using shifting and addition. The existing function call goes like this:

unsigned long __mulsi3 (unsigned long a, unsigned long b) {

    unsigned long answer = 0;

    while(b)
    {
        if(b & 1) {
            answer += a;
        };

        a <<= 1;
        b >>= 1;
    }
    return answer;
}

Although my hardware does not have a multiplier, I have a hard shifter. The shifter is able to shift up to 16 bits at one time.

If I want to make full use of my 16-bit shifter. Any suggestions on how can I adapt the code above to reflect my hardware's capabilities? The given code shifts only 1-bit per iteration.

The 16-bit shifter can shift 32-bit unsigned long values up to 16 places at a time. The sizeof(unsigned long) == 32 bits

The basic approach is (assuming shifting by 1) :-

Shift the top 16 bits

Set the bottom bit of the top 16 bits to the top bit of the bottom 16 bits

Shift the bottom 16 bits

Depends a bit on your hardware...

but you could try :-

assuming unsigned long is 32 bits

assuming Big Endian

then :-

 union Data32
        {
           unsigned long l;
           unsigned short s[2];
        }; 

unsigned long shiftleft32(unsigned long valueToShift, unsigned short bitsToShift)
{
    union Data32 u;
    u.l  = valueToShift
    u.s[0] <<= bitsToShift;
    u.s[0] |= (u.s[1] >> (16 - bitsToShift);
    u.s[1] <<= bitsToShift

    return u.l;
}

then do the same in reverse for shifting right

Having 16-bit shifts can help you in making minor speed enhancement using the following approach:

(U1 * P + U0) * (V1 * P + V0) =
= U1 * V1 * P * P + U1 * V0 * P + U0 * V1 * P + U0 * V0 =
= U1 * V1 * (P*P+P) + (U1-U0) * (V0-V1) * P + U0 * V0 * (1-P)

provided P is a convenient power of 2 (for example, 2**16, 2**32), so multiplying to it is a fast shift. This reduces from 4 to 3 multiplications of smaller numbers, and, recursively, O(N**(3/2)) instead of O(N**2) for very long numbers.

This method is described at least in Knuth's TAoCP. There are more advanced versions described there.

For small numbers (eg 8 by 8 bits), the following method is fast, if you have enough fast ROM:

a * b = square(a+b)/4 - square(a-b)/4

if to tabulate int(square(x)/4) , you'll need 1022 bytes for unsigned multiplication and 510 bytes for signed one.

the code above is multiplying on the traditional way, the way we learnt in primary school :

EX:

    0101
  * 0111
  -------
    0101
   0101.
  0101..
 --------
  100011

of course you can not approach it like that if you don't have either a multiplier operator or 1-bit shifter! though, you can do it in other ways, for example a loop :

unsigned long _mult(unsigned long a, unsigned long b)
{
    unsigned long res =0;

    while (a > 0)
    {
        res += b;
        a--;
    }

    return res;
}

It is costy but it serves your needings, anyways you can think about other approaches if you have more constraints (like computation time ...)

链接地址: http://www.djcxy.com/p/72614.html

上一篇: SIMD用无符号乘法签名为64

下一篇: 位乘以16