bit multiplication through 16
I am writing a soft-multiplication function call using shifting and addition. The existing function call goes like this:
unsigned long __mulsi3 (unsigned long a, unsigned long b) {
unsigned long answer = 0;
while(b)
{
if(b & 1) {
answer += a;
};
a <<= 1;
b >>= 1;
}
return answer;
}
Although my hardware does not have a multiplier, I have a hard shifter. The shifter is able to shift up to 16 bits at one time.
If I want to make full use of my 16-bit shifter. Any suggestions on how can I adapt the code above to reflect my hardware's capabilities? The given code shifts only 1-bit per iteration.
The 16-bit shifter can shift 32-bit unsigned long values up to 16 places at a time. The sizeof(unsigned long) == 32 bits
The basic approach is (assuming shifting by 1) :-
Depends a bit on your hardware...
but you could try :-
then :-
union Data32
{
unsigned long l;
unsigned short s[2];
};
unsigned long shiftleft32(unsigned long valueToShift, unsigned short bitsToShift)
{
union Data32 u;
u.l = valueToShift
u.s[0] <<= bitsToShift;
u.s[0] |= (u.s[1] >> (16 - bitsToShift);
u.s[1] <<= bitsToShift
return u.l;
}
then do the same in reverse for shifting right
Having 16-bit shifts can help you in making minor speed enhancement using the following approach:
(U1 * P + U0) * (V1 * P + V0) = = U1 * V1 * P * P + U1 * V0 * P + U0 * V1 * P + U0 * V0 = = U1 * V1 * (P*P+P) + (U1-U0) * (V0-V1) * P + U0 * V0 * (1-P)
provided P is a convenient power of 2 (for example, 2**16, 2**32), so multiplying to it is a fast shift. This reduces from 4 to 3 multiplications of smaller numbers, and, recursively, O(N**(3/2)) instead of O(N**2) for very long numbers.
This method is described at least in Knuth's TAoCP. There are more advanced versions described there.
For small numbers (eg 8 by 8 bits), the following method is fast, if you have enough fast ROM:
a * b = square(a+b)/4 - square(a-b)/4
if to tabulate int(square(x)/4)
, you'll need 1022 bytes for unsigned multiplication and 510 bytes for signed one.
the code above is multiplying on the traditional way, the way we learnt in primary school :
EX:
0101
* 0111
-------
0101
0101.
0101..
--------
100011
of course you can not approach it like that if you don't have either a multiplier operator or 1-bit shifter! though, you can do it in other ways, for example a loop :
unsigned long _mult(unsigned long a, unsigned long b)
{
unsigned long res =0;
while (a > 0)
{
res += b;
a--;
}
return res;
}
It is costy but it serves your needings, anyways you can think about other approaches if you have more constraints (like computation time ...)
链接地址: http://www.djcxy.com/p/72614.html上一篇: SIMD用无符号乘法签名为64
下一篇: 位乘以16