New square root algorithm, anybody up for a challenge ?

Buzby · Jun 7, 2014

Hi All,

For a project I'm working on now I need to take the square root of a 31 bit number. I know I could do it eventually with a PICAXE, but this project has other complex maths requirements as well, so I've jumped ship !.

But before I jumped I investigated another square root method, and wondered if this might have legs in the PICAXE world.

It's based on the fact that the sum of the set of odd numbers is always a square. ( Something I'd never realised, it seems so simple !. )
It was based on the formula that the sum of the first n odd integers is n squared.
Num --> Sum
1 --> 1 = 1
3 --> 4 = 1 + 3
5 --> 9 = 1 + 3 + 5
7 --> 16 = 1 + 3 + 5 + 7
9 --> 25 = 1 + 3 + 5 + 7 + 9
11 --> 36 ..... etc., etc,
13 --> 49
15 --> 64
17 --> 81
19 --> 100
21 --> 121
23 --> 144
25 --> 169
27 --> 196
29 --> 225
31 --> 256
33 --> 289

etc ...

The basic idea is to subtract odd numbers until the result goes negative, then the last number subtracted is close to twice the square root ( or something like that ).

The page I found this on, http://www4.wittenberg.edu/academics/mathcomp/bjsdir/ENIACSquareRoot.htm, shows how ENIAC did it, and shows how to speed the algorithm by orders of magnitude.

Unfortunately the writer of the page has made it too mathsy for me to be bothered following it in detail, but it does look like the idea would translate into a binary world quite easily, with shifts of 2^2 instead of 10^2.

Just a thought, if anyone wants to spend a wet Saturday afternoon twiddling bits.

Cheers,

Buzby

premelec · Jun 7, 2014

We used to use a marble matrix board for kids to illustrate this - 1 marble for 1 squared, 4 for 2 squared and so forth where the kids could see that an odd number of marbles would be added for the next square... I used the method on Curta and Friden mechanical calculators... Then there came the HP35... At the accuracy you require my old brain is at a loss what to do with 8 bits... Jeremy knows... twiddle on...

Pongo · Jun 7, 2014

Fast Integer Square Root from Microchip which uses binary shifting as suggested by buzby. Note that the integer square root is the highest integer which is is not greater than the actual square root, so it may, or may not, be an exact value.

SAborn · Jun 7, 2014

You do realize the X2 chips have a square root function, but for lesser chips we oftern need to do it in code, and not just call a function like we can in X2 chips.

Westy wrote a program a while back to do a square root function.

Jamster · Jun 7, 2014

You mean a bit like this?

Code:

symbol in = w0
symbol result = w1
symbol temp = w2
w0 = 156
sqrrt:
 result = 1
 do while temp < in
  temp = temp + result
  result = result + 2
 loop
 result = result /2
 sertxd( #result )

Buzby · Jun 7, 2014

Yes, X2 has SQR, but only for a word, but I need it for a 24 bit number ( or 31 if I add a few more loops. )

Jamster, your code is simple, but the ENIAC method got more interesting when it talked about speeding the algorithm up dramatically, which is what would be needed for 24 or 31 bits.

I did use the Newton-Raphson method to get SQR of a word on an 08M, but this project is much more complex. The SQR of 24bits is only one function I need, so I've gone for a different platform altogether. ( Compiled BASIC on a PIC ).

westaust55 · Jun 8, 2014

SAborn said:
You do realize the X2 chips have a square root function, but for lesser chips we oftern need to do it in code, and not just call a function like we can in X2 chips.

Westy wrote a program a while back to do a square root function.

The code I wrote gave the Squareroot to 2 decimal places but only for numbers up to 65535.

The concept could be extended to larger numbers - possibly at the loss of decimal places otherwise remainders become too big for easy maths.

BillBWann · Jul 12, 2014

Buzby said:
Hi All,

For a project I'm working on now I need to take the square root of a 31 bit number.

..............

Just a thought, if anyone wants to spend a wet Saturday afternoon twiddling bits.

It’s been cold and raining outside and I’m retired so I thought I’d take on Buzby’s suggestion and write a program for the 08M2 to get the square root of a 32 bit number. Actually, I ended up writing 2 programs - and I started before today - its just a coincidence that today's a wet Saturday.

The first is the simplest and simply does a binary search through all 2^16 possible answers. That only takes 16 multiplications and comparisons with the original number and takes about 120 msecs on an 08M2 at 4 MHz (or 15 msecs @ 32MHz). If your largest number was 24 bits then you’d only need 12 multiplications and comparisons. It uses 101 bytes of code space and 12 byte variables.

Code:

#picaxe 08M2
#no_data

pause 10

setfreq m8

symbol WordAns=w0
symbol LWordNo=W1
symbol HWordNo=W2

symbol Hguess=w3
symbol Lguess=w4
symbol guesssquaredHW=W5
symbol guesssquaredLW=W5

symbol rndm=w6


do
random rndm
HWordNo=rndm
random rndm
LWordNo=rndm

gosub sqrt
sertxd (#HWordNo,", ",#LWordNo,", ",#WordAns,cr,lf)
loop
'--------------------------------------------------------------------------------------------------------------

sqrt:
'Gets the square root of a 32 bit number stored in W1 & W2 (W1 is the low word) and leaves the answer in W0. The routine also uses registers b6 to b11 (W3 & W5).

clc:
if HWordNo>65534 then	'make exception for numbers above 65535^2
	WordAns=65535
	return
endif

if HWordNo=65534 then	'make exception for numbers above 65535^2
	if LWordNo>=1 then
		WordAns=65535
		return
	endif
endif

Hguess=65535
Lguess=0
WordAns=Hguess+Lguess/2

do
guesssquaredHW=WordAns**WordAns

if HWordNo>guesssquaredHW then
	Lguess=WordAns
	goto aver
endif

if HWordNo<guesssquaredHW then
	Hguess=WordAns
else
	guesssquaredLW=WordAns*WordAns
	if LWordNo>=guesssquaredLW then
		Lguess=WordAns
	else
		Hguess=WordAns
	endif
endif

aver:
WordAns=Hguess-Lguess/2+Lguess
Loop while WordAns>Lguess

return

The second method is based on a method we learnt in primary school which was a bit like long division but was more complicated and I invariably made a mistake doing them back then. I only remembered the method because in high school when we learnt about quadratic equations and roots, I realized that there was a connection between these and our primary school method – and if I discover something myself, I’m more likely to remember it.

(a + b)2 = a2 + 2ab + b2 which could be rearranged to (a + b)2 - a2 = 2ab + b2 = b(2a+b)

So if “a” was an approximation of the square root of (a+b)2 then the error “b” could be derived (partly by trial & error) using the above equation. If you want a proper explanation of our primary school method read this

For this second program, I’ve taken that one step further and assumed that if “a” is large relative to “b” then:

b = ( (a + b)2 - a2 )/2a (approx) or = error/2a (approx)

and it will never be more than this but maybe a bit less because we haven’t included “b” in the devisor.

I’ve found the square root of the first 16 bits of the 32 bit number using the binary search method discussed above and then used this equation to find the lower byte of the answer. In order to make sure that “a” is large relative to “b”, I’ve left shifted the original number if necessary and then right shifted the answer at the end.

This second method is considerably more complicated than the first (it uses 211 bytes of program code space & 12 byte variables) and is only marginally faster taking on average 100 msecs at a 4MHz clock rate but it kept me amused for some hours thinking about it and debugging it.

Code:

#picaxe 08M2
#no_data

pause 10

setfreq m8

symbol WordAns=w0
symbol TempW1=w0
symbol LWordNo=W1
symbol HWordNo=W2
symbol LByteAns=b0
symbol HByteAns=b1
symbol HByteLWordNo=b3

symbol Hguess=b6
symbol temp1=b6
symbol Lguess=b7
symbol temp2=b7
symbol guess=b8
symbol shft=b9
symbol guesssquared=W5
symbol guesssquaredHW=W5
symbol guesssquaredLW=W5
symbol diff=W5
symbol tempW2=w5

symbol rndm=w6
symbol HW=w7
symbol LW=w8
'--------------------------------------------------------------------------------------------------------------

do
random rndm
HWordNo=rndm
random rndm
LWordNo=rndm

HW=HWordNo
LW=LWordNo

gosub sqrt
sertxd (#HW,", ",#LW,", ",#WordAns,cr,lf)
loop
'--------------------------------------------------------------------------------------------------------------

sqrt:
'Gets the square root of a 32 bit number stored in W1 & W2 (W1 is the low word) and leaves the answer in W0. The original 32 bit word is lost and the routine also uses registers b6 to b11 (W3 to W5).

'Right Adjust orininal number -------------------------------------------------------------------

TempW1=HWordNo
for shft=8 to 1 step -1
	if TempW1=0 then exit
	TempW1=TempW1/4
next shft

if shft=0 then goto clc
if shft=8 then
	HWordNo=LWordNo
	LWordNo=0
	goto clc:
endif
	
TempW1=1
for temp2=1 to shft
TempW1=TempW1*4
next temp2

HWordNo=HWordNo*TempW1	'shift up bits of HWordNo & LWordNo
TempW2=LWordNo**TempW1
HWordNo=HWordNo+TempW2
LWordNo=LWordNo*TempW1

'Get the sq root of the high word using the binary search method --------------------------------
clc:
if HWordNo>=65025 then	'make exception for numbers above 255^2
	HByteAns=255
	goto getlb
endif

Hguess=255
Lguess=0

HByteAns=Hguess+Lguess/2
do
guesssquared=HByteAns*HByteAns

if HWordNo>=guesssquared then
	Lguess=HByteAns
else
	Hguess=HByteAns
endif
HByteAns=Hguess+Lguess/2
Loop while HByteAns>Lguess

'Determine the low byte of the answer using b=diff/2/a -----------------------------------------
getlb:
guesssquared=HByteAns*HByteAns

diff=HWordNo-guesssquared	'(diff<=511)

temp1=HByteLWordNo/2+1	'divide high byte of LWordNo by 2 & increment to allow for low byte
diff=diff*128+temp1	'divide by 2 and move up into high byte of Diff & add in half of LWordNo high byte

diff=diff/HByteAns	'b=diff/2/a

if diff>=256 then
	diff=255	'LByteAns can't be greater than 255
endif

LByteAns=diff	'LByteAns might be high because b wasn't includedb in the divisor

lprefine:	
guesssquaredHW=WordAns**WordAns
if guesssquaredHW>HWordNo then
	WordAns=WordAns-1
	goto lprefine
else
	if guesssquaredHW<HWordNo then goto exrf
	guesssquaredLW=WordAns*WordAns
	if guesssquaredLW>LWordNo then
		WordAns=WordAns-1
		goto lprefine
	endif
endif

'Right shift the answer to compensate for original left shifting --------------------------------
exrf: 
if shft=0 then goto rtn
TempW2=1
for temp2=1 to shft
TempW2=TempW2*2
next temp2

WordAns=WordAns/TempW2

rtn:return
'--------------------------------------------------------------------------------------------------------------

I checked the results of these programs by getting them to calculate all 65535 32 bit numbers generated by the random function (which is only 1/65536 of all possible 32 bit numbers) and comparing them to the values calculated by Excel and they all agreed.

AllyCat · Jul 19, 2019

Hi,

After several "false starts", I came across this thread whilst searching for a 32-bit square root algorithm for another project. Certainly the "binary search" method seems to be the best solution (at least for PICaxe's integer maths), because it uses primarily multiplication of two 16-bit words to give a 32-bit result, which PICaxe can perform rapidly (using the ** and * operators).

However, a perhaps little-known feature of the (PICaxe) "Random" function is that it never directly generates zero (but any value calculated from it can of course be modified to do so), so the above demos don't check for the Square Root of Zero (or any value less than 65537) ! Actually, even without this "Random" characteristic, there's only a 1 in 65536 chance that zero (or any other specific value) would have been tested, so I modified the programs to count up in units from zero. Also I added an elapsed "time" to the output, which produced some interesting results, but note that this was using the Simulator, so the "time" has no absolute meaning, however it does give an indication of the relative execution times.

The "good news" is that both versions seem to produce entirely correct results, but the second version takes an enormous time to calculate "root zero" and some numbers of the form N2 - 1 (e.g. 3 and 8) take longer to calculate than the adjacent numbers.

Code:

HiWord , LoWord , Root , Elapsed time
0, 0, 0 time= 76    ; Yes, it started at zero !
0, 1, 1 time= 80    ; Elapsed = 4 ,   etc...
0, 2, 1 time= 91
0, 3, 1 time= 115    ; Elapsed = 24
0, 4, 2 time= 120
0, 5, 2 time= 125
0, 6, 2 time= 133
0, 7, 2 time= 145
0, 8, 2 time= 162
0, 9, 3 time= 167
0, 10, 3 time= 172
0, 11, 3 time= 178
0, 12, 3 time= 186
0, 13, 3 time= 195
0, 14, 3 time= 206
0, 15, 3 time= 220
0, 16, 4 time= 225

For some applications the maximum execution time may be more important than the average, for example in "real-time" systems when a result might not be available when it's needed, or where a known, fixed "Pause" could be useful. Therefore, I created a Binary Search subroutine, based on the Microchip Application Note linked by Pongo in #3,. The subroutine has an almost constant (and minimum) execution time of around 100 ms (at 4 MHz) and needs only about 50 bytes of program plus 5 words (10 bytes) of variable space. The initial value remains in two of these registers and the program can be easily configured to run faster (with less iterations) if the maximum number of bits in the result is known to be less than 16.

The IF .. THEN {GOTO} .. structure is mainly used, because its execution times are predictable with the "indirect" (GOTO) path taking about 50% longer than the "fall through" (false condition). Therefore, the code is structured so that the "longest" path (i.e. which requires testing of both High and Low words and also subtracting the most recent candidate bit) uses the "fall-through" path. This does give a slightly less structured program, with one more (unconditional) GOTO, which seems a reasonable price to pay where speed may be an issue. I also tried a version with a "fast exit" (if an exact result was found earlier) but the additional comparison not only increased the maximum execution time, but also the average time, so only rarely would a result be obtained more quickly.

Code:

; Square Root of any 32-bit number using Binary Search.  AllyCat, July 2019
#picaxe 08m2                    ; And most others
#no_data                         ; No Eprom/Table data required
symbol numberL = w1            ; Input Low word (16 bits)
symbol numberH = w2            ; Input High word
symbol root = w3             ; Result
symbol mask = w4             ; And Loop counter (never more than one bit is set)  
symbol square = w5                 ; Scratchpad
symbol MAXMSB = $8000        ; Top MSb of maximum result (e.g. $800 for 12 bits)  

for numberH = 0 to 65535            ; For comprehensive test
  for numberL = 0 to 65535
; random numberL                            ; Option for testing (never zero)
; random numberH                                ; For testing large numbers (>16 bits)
    call squareroot
   sertxd ("Root(" , #numberH , ":" , #numberL ,") = " , #root , " time = " , #time , cr , lf)   
  next
next
end

squareroot:                    ; About 48 bytes as a subroutine
root = 0                                            ; Minimum result
mask = MAXMSB                                    ; Start with the Most Significant bit of result
do
    root = root + mask                        ; Set the next "candidate" bit
    square = root ** root                    ; High word
    if square <> numberH then hdiff        ; Fall through to test Low word
    square = root * root
    if square <= numberL then keep        ; Fall through if square is too large
subt:
    root = root - mask                        ; Remove the candidate bit
keep:
    mask = mask / 2                            ; Shift Right (to next Lower Significant Bit)
loop until mask = 0                           ; Repeat until Bit has fallen off the Right Hand Side  
return
hdiff:                     ; Located after the return saves a Goto
    if square > numberH then subt
    goto keep

Note that, like most integer square root algorithms, the result is not the "nearest" integer, but the "next lower" integer (unless exactly correct). However, rounding to the nearest integer result is quite easy, so I may add that in a future post.

Cheers, Alan.

AllyCat · Jul 25, 2019

Hi,

Most integer Square Root algorithms "ignore" the fractional part of the result so effectively always "round down" to the lower value. Thus the square root of 3 is returned as 1 and of 8 is 2, etc.. However, the rule for rounding up (to the nearest integer) is quite easy: do it if the "remainder" (i.e. the original number minus the root2) is greater than the calculated root.

With some algorithms, both values will have been automatically calculated by their end, but this is not the case for the algorithm in the previous post, for two reasons: Firstly, that calculation uses the same register for the High and Low word intermediate products (so both are not available together) and secondly the calculated product may be for the next higher root (i.e. before the least significant "candidate" bit was subsequently deleted). Therefore, the rounding-up algorithm needs to re-calculate the square of the resultant root, and then subtract it from the original number.

Another complication is that if the original number is greater than 65535:0 (Hi:Lo words) then rounding up would take the resultant root from 65535 to 65536, or into the "17th bit", thus giving a (Low) word result of zero. Of course this only applies for the largest (32-bit) numbers, but the code below includes this "exception" for completeness. So here is the additional code which should be inserted immediately before the "Return" of the subroutine above.

Code:

#define roundup
#ifdef roundup		; Optional code to round (up) the result to the nearest integer
; Use "square" for High word of remainder, "mask" for Low word
rounding:
	if numberH > 65534 and numberL > 0 then overflow		; Trap overflow error
	mask = root * root						; Low word
	square = root ** root					; High word
	square = numberH - square				; Calculate Remainder, High word
	if mask > numberL then					; Low word will require a Borrow 
		dec square						; Subtract the Borrow
	endif
	mask = numberL - mask					; Remainder, Low word
	if square > 0 or mask > root then			; Root is max 16 bits
		inc root							; Round up
;		sertxd("+ ")						; Optionally report the rounding
	endif
overflow:									; Skip rounding-up (or carry into a 17th bit) 		
#endif

In most applications it won't be worthwhile to include rounding, because it almost doubles the size (but not execution time) of the subroutine. However, it might be useful in (say) single-word (16-bit) calculations, where the rounding error is proportionally larger and the computation is simpler (single-words with no risk of an overflow).

Cheers, Alan.

New square root algorithm, anybody up for a challenge ?

Buzby

Senior Member

premelec

Senior Member

Pongo

Senior Member

SAborn

Senior Member

Jamster

Senior Member

Buzby

Senior Member

westaust55

Moderator

BillBWann

Member

AllyCat

Senior Member

AllyCat

Senior Member