Square root in assembly, how to shift and change bits

2018-07-01 01:33:37

I want to write a fast integer square root algorithm in assembly, it takes unsigned 32-bit. I've been reading through this, and got an idea. Here's my pseudocode:

res <- 0
for i from 15 downto 0 do:
   change the ith bit of result to 1
   if res^2 > x then:
      change the ith bit of res back to 0
return res

I've done up to here so far:

sqrt:
  movl $0, %eax
  movl $15, %edx
  jmp .L8
.L9

.L8
  cmpq cmpq $0, %edx
  jge .L9

I'm stuck at the for loop operations, changing the ith bit and shifting. I also don't wanna use division or sqrt instructions. I know I should probably use shr , but I have no idea where to start or how to do it. How can I do the operations in the for loop? Where do I start?

(Intel syntax, convert to AT&T on your own)

    mov   ebx,<number> ; *number* to find sqrt of
    mov   ecx,0x8000   ; bitmask (starting with b15 bit set)
    ;^^^ 0x8000 = decimal 32768 = binary 1000 0000 0000 0000
    xor   eax,eax      ; result <- 0
sqrt_loop:
    xor   eax,ecx      ; set bit in eax
    push  eax          ; store result (will be destroyed by mul)
    mul   eax          ; edx:eax <- eax*eax (ignoring edx next)
    cmp   eax,ebx      ; compare with *number*
    pop   eax          ; restore result
    jbe   keep_bit     ; res^2 <= *number* -> bit stays set
    xor   eax,ecx      ; unset bit in eax
keep_bit:
    shr   ecx,1        ; next bit
    jnz   sqrt_loop    ; loop till all bits are tried

(I didn't tried+debug it, so there may be some bug. But I think together with your pseudo algorithm and your rewrite to AT&T with debugging this should be enough to get you started)

As Margaret pointed out, number is number, it's the value. So 0x8000 is already encoded in CPU wires as b15 set to 1 and other bits set to 0. All the conversion stuff happens when you want to convert the value from/into string, but as long as you are calculating with values, it's there in the register in all forms at the same time. It just depends, how you look at the register. Using hexa/decimal/binary in source is that, writing STRING representation of number, which gets turned into the value itself by assembler.

The binary representation is special, as the CPU can address particular bits (with and/xor/or, rotations, bit test/set, etc), as it has those values in sort of "wires" and it's native representation for it. It's like when human is "cheating" when calculating "10*3456", writing just additional 0 at the end to get result, because in decimal format that's how 10* is special. For CPU the same happens with bit manipulation and all kind of power of 2 math. But the decimal tricks are not possible, those has the CPU to calculate in proper way, multiplying by 10 for real.

Anyway, when you have only the bit number, and you want to get the bitmask itself, like how to get 0x8000 from 15:

mov   ecx,15  ; i-th bit
mov   eax,1   ; set b0 (lowest bit)
shl   eax,cl  ; shift all bits (all zeroed + b0 set) cl-many times left
; eax now contains 0x8000 = b15 set, other bits zeroed

So if you would stick with your way of algorithm, you would have to recalculate the for counter to bit mask every time (or use some bit set/reset instructions, which I don't know from head, as almost never needed them).

But if you study my code, you will see there's direct shortcut to work over the bitmask itself, without counting the "i-th bit" part, making the code shorter and faster (although I probably killed it by that push/pop, maybe using some one more register like esi to store the value would be better ... then again this demonstrates how stack can be used, and also how flags are not affected by certain instructions, so you can use cmp results in postponed way, if you are careful to not modify the required flag).

链接地址: http://www.djcxy.com/p/86642.html

上一篇: 获取近似平方根

下一篇: 装配中的平方根，如何移位和更改位