Square root in assembly, how to shift and change bits
I want to write a fast integer square root algorithm in assembly, it takes unsigned 32-bit. I've been reading through this, and got an idea. Here's my pseudocode:
res <- 0
for i from 15 downto 0 do:
change the ith bit of result to 1
if res^2 > x then:
change the ith bit of res back to 0
return res
I've done up to here so far:
sqrt:
movl $0, %eax
movl $15, %edx
jmp .L8
.L9
.L8
cmpq cmpq $0, %edx
jge .L9
I'm stuck at the for loop operations, changing the ith bit and shifting. I also don't wanna use division or sqrt instructions. I know I should probably use shr
, but I have no idea where to start or how to do it. How can I do the operations in the for loop? Where do I start?
(Intel syntax, convert to AT&T on your own)
mov ebx,<number> ; *number* to find sqrt of
mov ecx,0x8000 ; bitmask (starting with b15 bit set)
;^^^ 0x8000 = decimal 32768 = binary 1000 0000 0000 0000
xor eax,eax ; result <- 0
sqrt_loop:
xor eax,ecx ; set bit in eax
push eax ; store result (will be destroyed by mul)
mul eax ; edx:eax <- eax*eax (ignoring edx next)
cmp eax,ebx ; compare with *number*
pop eax ; restore result
jbe keep_bit ; res^2 <= *number* -> bit stays set
xor eax,ecx ; unset bit in eax
keep_bit:
shr ecx,1 ; next bit
jnz sqrt_loop ; loop till all bits are tried
(I didn't tried+debug it, so there may be some bug. But I think together with your pseudo algorithm and your rewrite to AT&T with debugging this should be enough to get you started)
As Margaret pointed out, number is number, it's the value. So 0x8000 is already encoded in CPU wires as b15 set to 1 and other bits set to 0. All the conversion stuff happens when you want to convert the value from/into string, but as long as you are calculating with values, it's there in the register in all forms at the same time. It just depends, how you look at the register. Using hexa/decimal/binary in source is that, writing STRING representation of number, which gets turned into the value itself by assembler.
The binary representation is special, as the CPU can address particular bits (with and/xor/or, rotations, bit test/set, etc), as it has those values in sort of "wires" and it's native representation for it. It's like when human is "cheating" when calculating "10*3456", writing just additional 0 at the end to get result, because in decimal format that's how 10* is special. For CPU the same happens with bit manipulation and all kind of power of 2 math. But the decimal tricks are not possible, those has the CPU to calculate in proper way, multiplying by 10 for real.
Anyway, when you have only the bit number, and you want to get the bitmask itself, like how to get 0x8000 from 15:
mov ecx,15 ; i-th bit
mov eax,1 ; set b0 (lowest bit)
shl eax,cl ; shift all bits (all zeroed + b0 set) cl-many times left
; eax now contains 0x8000 = b15 set, other bits zeroed
So if you would stick with your way of algorithm, you would have to recalculate the for counter to bit mask every time (or use some bit set/reset instructions, which I don't know from head, as almost never needed them).
But if you study my code, you will see there's direct shortcut to work over the bitmask itself, without counting the "i-th bit" part, making the code shorter and faster (although I probably killed it by that push/pop, maybe using some one more register like esi
to store the value would be better ... then again this demonstrates how stack can be used, and also how flags are not affected by certain instructions, so you can use cmp
results in postponed way, if you are careful to not modify the required flag).
上一篇: 获取近似平方根
下一篇: 装配中的平方根,如何移位和更改位