# A Simple "Double-Word" Division Subroutine (maximum 31 bits by 15 bits)

#### AllyCat

##### Senior Member
PICAXE Basic only supports arithmetic up to 16 bits, which can make some calculations difficult. Often it's possible to keep the calculation within bounds by alternating multiplication and division, etc., but sometimes accuracy suffers. For example, suppose a READADC10 value needs to be scaled or "calibrated" by +1%. That requires multiplication by 101 and division by 100, but multiplication first could create an overflow (10 bits * 7 bits = 17 bits) and division used first loses accuracy.

However, PICAXE Basic does offer "partial" multiplication up to a "Double-Word" (32 bits) using the * and ** operators, so the aim of this subroutine is to provide a "reverse path", i.e. accepting a Double-Word input (and a Divisor) and returning Single-Words for Result and Remainder. Note that this is NOT intended as a full "16-bit maths" routine, more a "Get out of Jail Free card" for "difficult" calculations. Several 32 bit routines have been discussed in this thread, including a full 32 by 32 bit division from hippy, and I hope to document an "improved" version of my full 32 bits by 16 bits version soon.

To keep the code simple and compact (it actually uses less memory than a BINTOASCII word command) a couple of restrictions are necessary. Firstly, the divisor is limited to 15 bits (which automatically implies that the "Remainder" will also fit into a single word). The reason is that PICAXE Basic doesn't support a "Carry/Borrow" flag, so (to keep the code simple) the top bit of the numerator is used for that flag during subtraction, leaving only 15 bits for the divisor (i.e. < 32,768). The other restriction is that the "Result" is also limited to a single (16-bit) Word (i.e. < 65,536). Thus the numerator cannot be more than 31 bits (just over 2 billion) and is further restricted with smaller divisors. For example if dividing by ten, then the maximum numerator is 655,359 (20 bits).

Because of these restrictions, a short (optional) "preamble" code tests for data validity and immediately returns an "Error Code" when appropriate. A detected input error leaves the data unchanged and avoids wasting time calculating an incorrect result. However, since PICAXE Basic doesn't implement any run-time error-handling facilities, the data validation section is mainly for demo/testing and can be omitted, with (as usual) the responsibility for the calculations staying within bounds being that of the software writer.

So here is the code which uses about 37 bytes for the simple (non-validating) subroutine and 7 bytes of RAM, i.e. four bytes for the double-word numerator, two bytes for the divisor word and a single byte for loop counting and the return of the Error Code when appropriate:

Code:
``````#picaxe 20m2			; And most others
; Divide Double-Word (w3:w2) by w1 using binary Long Division
; w2 = Result; w3 = Remainder; w1 unchanged; b0 = 0 if input error
; Max divisor 32767 (15 bits), max result 65535 (16 bits)
; AllyCat  15th June 2012

symbol NUMERATOR 	= 123456789     ; Test Data
symbol DIVISOR  	= 10000
symbol NUMERHI  	= NUMERATOR / \$10000
symbol NUMERLO  	= NUMERATOR & \$0FFFF

symbol topbit = \$8000		; Value of "carry" bit (when shifting left)
symbol numhi = w3			; Numerator Hi, Returns Remainder
symbol numlo = w2			; Numerator Lo, Retuns Result
symbol divis = w1			; Divisor (max 15 bits), Unchanged on Return
symbol pass  = b0			; Loop counter, on return = 0 if data error

Testdiv31: 		 		; Test routine
numhi = NUMERHI 		; Numerator High -> Remainder, max 15 bits (32767)
numlo = NUMERLO 		; Numerator Low -> Result, max 16 bits (65535)
divis = DIVISOR		; Divisor, max 15 bits (32767)
gosub div31v		; Validate input data and divide numerator by divisor
if pass = 0 then
do : loop
endif
sertxd ("Result=",#numlo," Remainder= ",#numhi,cr,lf)	; Print result
goto Testdiv31
div31v:					; Validate data and Divide numerator by divisor
pass = 0				; Pre-set the error marker
if divis => topbit OR numhi >= divis then done		; Data Error so quit
div31:					; Enter here to Divide with NO error check
for pass =  0 to 15		; Repeat for each bit position
numhi = numhi + numhi		; Start to shift numerator left (top bit is lost)
if numlo >= topbit then 	; Carry bit from low word is '1'
numhi = numhi + 1	; Add the carry to hi word
endif
numlo = numlo + numlo		; End of Shift
if numhi < divis then nosub	; Jump if can't subtract
numhi = numhi - divis		; Subtract divisor and ...
numlo = numlo + 1		; Add the flag to result (in low word)
nosub:
next pass
done:
return``````
On Return, the "Remainder" replaces the "High Numerator" word and the "Result" replaces the "Low Numerator" word, but the Divisor is unchanged (i.e. the code behaves rather like a w1 = w1 / w2 assignment). So if it is desired to retain the value of the numerator, then the original word values must be copied to the numhi and numlo variables before entering the subroutine. Of course, the calculation takes considerably longer than the embedded PICAXE division, about 100ms with a 4MHz clock. However, this time is moderately constant and could be made even more consistent with a minor addition to the code, if desired. Thus the routine might "double up" as a PAUSE 100 in some instances.

Cheers, Alan.

#### lbenson

##### Senior Member
Noted. Thanks for this.

#### srnet

##### Senior Member
Very useful.

I was using a routine calculate the distance between two sets of GPS co-ordinates, the calculation involved using the longest of the NS or EW distance thus;

direct Distance = longest/(COSX*1000)

The 'error' in the original routine was 10M from 100M to 999M and 100M from 1000M to 9999M.

Now and error of 100M in 9999M is not a lot, but 100m in 1000M is a lot.

I just tried your code and I get an 10 fold increase in accuracy, so the distance calculation would now be at an error of 1M up to 999M, and 10M up to 9999M, which is more than acceptable.

The code is slightly slower, but not an issue for my application.

I will be revising the code for the GPS locator beacon shortly for new hardware, so I will probably include your code in the routines.

#### AllyCat

##### Senior Member
Hi,

Thanks for the feedback. That's rather greater improvement than I achieved in my "battery tester" application, but then I had already optimised the code as far as possible (using 16 bits divided by 8 bits). So a 10 bit A/D conversion only improved the resolution by the expected factor of about 4. However, the code may be more readable now, since it can use more meaningful scaling factors (normally 1mV per digit).

Perhaps this is an appropriate time to cross-reference my similar "snippet" which performs full 32-bit by 16-bit division. It takes about 50% longer to execute, so the code here will generally be preferable, but it does remove the need for any error testing/protection, except for the obvious division by zero (or any previous under or overflows) and has no limit on the use of small divisors (even with large numerators).

Cheers, Alan.

#### srnet

##### Senior Member
I will take a look at that calc.

Speed of execution is not a particular issue, the rest of the calc takes a while, so its run at 64Mhz, which means the division part should take only 10ms or so.

#### Peter Goldsbury

##### Member
Hi Allycat,

A. In addition to solving the raingauge problem that started this thread, I have used your older 32/16 divide routine for an optical spectrum analyser that processes readadc10 inputs where the range of each can be from 0 to full scale. As a resullt, I had to add an extra line to check for divide by zero. You will see I re-lable the registers to give ww1-4 for in line maths processing etc and wb1-4 for general use.

Code:
``````;*******************************

;*******************************

;DIVIDE - subroutine to divide a 32 bit number by a 16 bit divisor
;Acknowledgement Alan (Allycat)  London on Picaxe forum,
;subroutine takes 61 bytes it typically executes about 180 lines taking around 150ms (at 4 MHz)
; (Testing with code below takes approx 10 msecs on a 18m2 runing at 32 mHz)

; Divide ww1.ww2 by ww3 using "Long Division", result will be in ww4
; be careful as this also uses wb3 and wb4 as part of its process

symbol numlo = ww1
symbol numhi = ww2
symbol numtop = ww2h		;  highest byte of numerator  ( = ww2h)
symbol denom = ww3
symbol result = ww4
symbol pass  = wb3
symbol carry = wb4            ; Could use bit flag instead  (used as zero / non-zero)

divide:				; 52 bytes in subroutine

result = 0			; Clear result register

if denom = 0 then return ; Check for a divide by zero  and if so jump out with zero result
endif

carry = 0			; Clear carry "flag"
for pass = 0 to 16	; Repeat for each bit position
result = result + result         ; Shift left (do BEFORE the final bit is added)
if numhi < denom AND carry = 0 then shift ;  Jump if can't subtract
result = result + 1	             ; Store the databit
numhi = numhi - denom	; Subtract the denominator
shift:
carry = numtop AND \$80	; Carry flag from numerator top bit
numhi = numhi + numhi	; Shift numerator high left
if numlo > \$7FFF then 	; Carry bit from low word
numhi = numhi + 1	; Add carry to high word
endif
numlo = numlo + numlo	; Shift numerator low left
next pass
return``````
B. Once the subroutine is in included in the code application programming is trivial even for unexperienced programmers as the example below that reads battery volts using cailbadc demonstrates. (Using 28X2)

Code:
``````;*********************************
; subroutine to read and store battery voltage
;  ( beware the divide routine here also uses wb3 and wb4)

setfreq m4
setfreq m16
let ww3=ww3*10			; divisor
let ww1= 10240*1024		; low word of numerator in ww1
let ww2= 10240**1024		; high word of numerator in ww2
gosub divide			; return with result in ww4
battvolts = ww4		 	; return with volts x 1000  ( e.g. 4,800 volts)

return``````
C. I had much more difficulty getting my head around your latest and more general 32/16 subroutines with all its different entry points. I really your earlier simple black boxes that work without getting too precious about about the semantics. If there is anythig you suggest I need to change as a result of your later work please let me know.

D. You made the comment about adding and subtracting 32 bit numbers further back. I have no need of it just yet but you might like to round out the ofering with some equally sophisticated subroutines to do that too. Picaxe does not (in the manuals at least) seem to support macro command programming which might have made this easier.

Thanks again for the great contribution you are making to unlease some otherwise very frustrated PICAXE users!!

Last edited by a moderator:

#### AllyCat

##### Senior Member
Hi Peter,

There are two slight improvements in the later code, firstly it combines the numerator and result registers (a well-known trick in assembler versions) to reduce the number of "shift" operations and thus increases speed slightly. Secondly, it now loops 16 times (FOR ... = 0 TO 15); I found it "niggling" that the previous version needed 17 passes (0 TO 16) to process 16 bits correctly.

PICaxe really isn't suited (or intended) for full 32-bit maths, so I still think that the "simplified 31/15 bit" divison routine is the most useful (to recover from minor excesses of a "16 * 16 bit" multiplication) but the "full 32/16" bits routine is useful, for example when "32-bit" results might occur. So I believe that 32-bit additional/subtraction is relatively unnecessary, but was quite surprised that in PICaxe basic it's not "trivial", IMHO rather more difficult than in assembler (because PICaxe doesn't directly support Carry/Borrow/Overflow flags).

Therefore, I have now devised some 32-bit add/subtract subroutines, but admit that I probably would have given up (as I have no particular use for them) were it not for hippy's useful Carry/Borrow routines earlier in this snippets section. My inclination is to post them as an addendum to the 32/16 bits snippets thread because I don't think they're very useful without the "32-bit to denary" conversion routine posted there. Also, I don't really want to "encourage" the general use of 32-bit maths by starting a new thread (since "scaling" data to remain within 16-bits is generally a "preferred" method).

So (subject to any comments/suggestions here) keep a watch on that other thread. Cheers, Alan.

#### ferrymanr

##### Member
I have tried using this routine and it works well as it stands. I have also tried to do a 'gosub Testdiv31' with my values already in numlo, numhi and divis and other symbols defined. If I place a place a 'debug' and 'stop' after the done: label and before 'return' I have the correct result in numhi (remainder) and numlo (result). However I cannot get the routine to return to my calling code. It seems to simply repeat the routine indefinitely. I am confused by the the structure that jumps to other labels in the routine. Can someone tell me how I can call this as a subroutine and return to the calling code.
Thanks
Richard

#### ferrymanr

##### Member
Done it! I woke up after a couple of cups of coffee and some fresh air and re-read the routine. I just had to change 'goto Testdiv31' to 'return'. I was looking in the wrong place for the end of the routine.
Richard

#### AllyCat

##### Senior Member
Hi,

It's more than two years since I wrote this code, but I still use it quite often in my projects, so perhaps it's time for a minor update:

A strategy I have now adopted for nearly all of my PICaxe programs is to reserve b1 and w1 as temporary or "local" variables, also used to pass values to and from subroutines, etc.. So here, I use b1 for the loop counter (and to return an error code if appropriate) and w1 to load the numerator low word and to receive the result. b0 is still available for "global" flags (up to 8 separate bits), if required.

However, this division subroutine requires two more word variables (numerator high and divisor) for which I've used w2 and w3 here, but they could be any word variables, including two unused "system variables" (s_w1, etc.), which would allow the maximum number of normal PICaxe variables to be retained for the main program.

As w1 is bit-addressable, I've now slightly simplified the code by using bit31 directly (for the carry from the low to high word), rather than testing a numerical comparison. To summarise the main features of the code again, the maximum divisor is 15 bits (the 16th is required for a Carry bit), the maximum result must fit in one word (16 bits), so the maximum numerator becomes 31 bits (or correspondingly less with a smaller divisor). Here is my latest version:

Code:
``````; 31 bit by 15 bit (maximum) division subroutine.  AllyCat, updated July 2014.
symbol tempb = b1			; Temporary (local) byte
symbol numlo = w1			; Must be w1 (to use bit flags), also used for result word
symbol numhi = w2			; Or any other word, also used for remainder word
symbol divis = w3			; Or any other word, unchanged on return

div31:        ; Divide numerator (w2:w1) by divisor (w3, max 15 bits), no error checking
for b1 =  0 to 15  				; Repeat for each bit position
numhi = numhi + numhi + bit31		; Start to shift numerator left (top bit lost)
numlo = numlo + numlo    			; Shift low word
if numhi >= divis then    			; Skip if can't subtract
numhi = numhi - divis			; Subtract divisor, then..
inc numlo       		 	  ; Add the flag into result (in low word)
endif
next b1					; Typically 26 bytes, execution time 70ms
return		; Result in numlo (w1), remainder in numhi (w2), divisor (w3) unchanged``````
The above code is very compact, requiring less than 30 bytes of codespace, but is relaitively slow, executing in about 70 ms with a 4MHz clock. Therefore, I have also devised a "faster" version which can execute the loop less than 16 times, the number being passed to the subroutine in b1. The minimum number of loops is determined by the number of bits in the (integer) result. For example, the demonstration code below estimates the PICaxe supply rail voltage to two decimal places by using the CALIBADC10 command. A result of 511 represents 5.11 volts, so we need to accommodate a 10-bit result (10.23 volts) to avoid any risk of an overflow. Therefore, the code only needs to loop 10 times, which takes just under 50ms at 4MHz.

However, by limiting the loop count to 10, the number of left-shifts is reduced, so it's necessary to modify the (initial) numerator value accordingly, i.e. by multiplying by 2^6 (64) in this example. The sample code performs this multiplication, but if the numerator is a pre-defined constant (as here) then the value could be loaded directly. However, beware that the lowest (6) bits in the low numerator word must be loaded with zeros, because they will remain in the word (w1) when it becomes the result register.

Code:
``````; Estimate the PICaxe supply rail to 10 mV resolution using CALIBADC10.   AllyCat, July 2014.
symbol NUMERATOR = 104858					; 1024 * 102.4 (ADC Fullscale * Ref in c-volts)
symbol NUMERATORHI = NUMERATOR / 65536		 		; High Word
symbol NUMERATORLO = NUMERATOR and 65535	 		; Low Word
b1 = NUMERATORLO ** 64						; Carry bits from low word  } Compensate
numhi = NUMERATORHI * 64 + b1		 			; Shift High word left      } for omitted
numlo = NUMERATORLO * 64					; Shift Low word left       } loops (ie 2^6)
b1 = 10								; Maximum Number of bits in integer Result
call div15m
sertxd("Vdd= ")							; Report the supply rail
call show2dp
sertxd(" Volts ")
end

div15m:
do
numhi = numhi + numhi + bit31				; Start to shift numerator left (top bit lost)
numlo = numlo + numlo      				; Shift done
if numhi >= divis then    				; Skip if can't subtract
numhi = numhi - divis				; Subtract divisor, then..
inc numlo			       		; Add the flag into result (in low word)
endif
dec b1		       			; Loop (bit) counter
loop until b1 = 0				; Typically 26 bytes, executes in ~5ms per result bit
return		; Result in numlo (w1), remainder in numhi (w2), divisor (w3) unchanged

show2dp:		; Show value of w1/100 with two decimal places (max 655.35)
b1 = w1 // 100					; Fractional part
w1 = w1 / 100					; Integer part
sertxd(#w1,".")
if b1 < 10 then
sertxd("0")				; Restore suppressed leading zero
endif
sertxd(#b1)
return``````
Generally, the execution time will not be reduced by more than about 40%, because for a result of 8 bits or less, fairly simple and faster algorithms probably can be devised. However, they likely will require more codespace (and development) so there may be occasions when using this code is worthwhile.

Cheers, Alan.