Hi,
Here is an alternative method for measuring the exectution times of individual (or multiple) instructions using the Timer1 hardware in recent PICaxe chips. I've only tried M2 devices, but it also should be usable at least for the X2 family. Advantages are that instruction cycles are read directly on a single device which can thus report the test associated with each measurement, and results are independent of the clock frequency. However, the "Special Functon Register" (SFR) addresses for Timer1 vary between PICaxe familiies (M2, X2, etc.) so the test code must be personalised for each family. The results appear to be "consistent" (i.e. a re-run gives identical values) but, as explained earlier in this thread, even the slightest change to the program code can alter the token/byte boundaries, and thus the execution times by at least some (maybe many) tens of cycles.
Initially, I attempted a "non-invasive" method, of reading Timer1 before and after each instruction in the test. However, the maths becomes very complicated because the timer can overflow at many points during the measurement. The timer does not simply wrap around to 0 but appears to be reset to 45536 (-20,000) to give a 20ms sub-tick for the (one second) "time" variable. This seems to occur even if DISABLETIME is commanded (and then Timer1 restarted by writing directly to the SFR control register). Therefore, a rather more "brute force" method of resetting Timer1 is used, which of course upsets the value of the PICaxe "time" variable. However, another advantage of resetting Timer1 is that the maximum measurement time is about 60,000 instruction cycles (allowing for an initial "null" execution time), compared with a maximum of about 15,000 cycles if the timer were not reset.
Timer1 uses two one-byte registers which AFAIK cannot be read or written concurrently (in PICaxe Basic), so the program reads the High byte first, then the Low byte and then the High byte again. The two High byte values are then averaged, taking into account their (odd/even) separation and the value of the Low byte. Beware that some bytes in the program are used as part of a Word variable, including High bytes in a "Low" position (to correctly handle overflow/carry). The first reported measurement is of a Null delay (i.e. the subroutine return/call and the SFR poke/peeks) which is then subtracted from subsequent reported values. Here is the basic core program:
Code:
#picaxe 20m2 ; SFRs only tested for M2 series
#no_data ; No EEPROM data
symbol TMR1L = $16 ; Timer 1 Low byte
symbol TMR1H = $17 ; Timer 1 High byte
symbol T1CON = $18 ; Timer 1 Control register
symbol Nul = w5 ; Null time delay
symbol WVHL = w6 ; Word variable
symbol WVLo = b12 ; Word variable Low byte
symbol WVHi = b13 ; Word variable High byte
symbol TMBV = b14 ; Byte variable for reading timer
main:
Nul = 0 ; Initialise time delay
sertxd("Null=")
gosub start ; Start Timer 1
gosub measure ; Read and report timer delay
Nul = WVHL ; Set NUL time delay
sertxd("b0=b0:") ; Report the test instruction
gosub start
b0 = b0 ; Test instruction
gosub measure
stop
start:
pokesfr TMR1L,0 ; Set Timer Lo byte
pokesfr TMR1H,0 ; Zero Timer Hi byte
return
measure:
peeksfr TMR1H,WVLo ; Read Timer Hi byte
peeksfr TMR1L,TMBV ; Read Timer Low byte
peeksfr TMR1H,WVHi ; Read Timer Hi byte again
if TMBV < 128 then inc WVLo endif
WVHL = WVLo + WVHi / 2 ; Average Timer High byte
WVHi = WVLo
WVLo = TMBV ; Timer Low byte
WVHL = WVHL - Nul ; Subtract Nul delay
sertxd (#WVHL," ") ; Report the time delay
return
Below is a more complete test harness. It repeats, using different starting values of the timer Low byte, as a "sanity check" that the carry from low to high bytes is handled correctly. At the end, a Null is measured again, which should be within a few tens of instruction cycles of zero (or 65536). With an M2 at nominal 4MHz clock frequency, the (PIC Assembler) instruction cycles are exactly 1us so I use these units as a "shorthand" for Instruction Cycles. However, for an X2 the nominal time is halved, and the minimum execution times for M2 and X2 devices respectively (at maximum 32 and 64MHz clock frequencies) would be divided by 8 and 16.
Code:
#picaxe 20m2 ; SFRs only tested for M2 series
#no_data ; No EEPROM data
#define M2 ; PICaxe family
#ifdef M2
symbol TMR1L = $16 ; Timer 1 Low byte
symbol TMR1H = $17 ; Timer 1 High byte
symbol T1CON = $18 ; Timer 1 Control register
#endif
#ifdef X2
symbol TMR1L = $CE ;)
symbol TMR1H = $CF ;)-Not tested
symbol T1CON = $CD ;)
#endif
symbol Nul = w5 ; Null time delay
symbol WVHL = w6 ; Word variable
symbol WVLo = b12 ; Word variable Low byte
symbol WVHi = b13 ; Word variable High byte
symbol TMBV = b14 ; Byte variable for reading timer
symbol Seed = b15 ; Low seed for timer low byte
symbol Pass = b16 ; Loop counter
main:
for Seed = 0 to 256 step 32 ; Sanity check for data handling
Nul = 0 ; Initialise time delay
sertxd(cr,lf,"Seed=",#Seed) ; Report testing seed value
sertxd(cr,lf,"Null=") ; Report Null instruction period
gosub start ; Start Timer 1
gosub measure ; Read and report timer delay
Nul = WVHL ; Set NUL time delay
sertxd("Swap=") ; Report the test instruction
gosub start
swap b0,b0 ; Test instruction
gosub measure
sertxd(cr,lf,"Gosub+Ret=")
gosub start
gosub ret
gosub measure
sertxd("Gosub(only)=")
gosub start
gosub dummy
gosub measure
sertxd(cr,lf,"IFtrue=")
gosub start
if b0 = b0 then true
true:
gosub measure
sertxd("IFfalse=")
gosub start
if b0 <> b0 then false ; Fall through
false:
gosub measure
sertxd("Readtable=")
gosub start
readtable b0,b0
gosub measure
for Pass = 0 to 3 ; Now Test some multiple values
sertxd(cr,lf,"b0*",#Pass,"=") ; Report the test instruction
gosub start
b0 = b0 * Pass
gosub measure
sertxd("b0/",#Pass,"=")
gosub start
b0 = b0 / Pass
gosub measure
sertxd("OnGoTo",#Pass,"=")
gosub start
on Pass goto nx,nx,nx ; Test instruction (or fall thro')
nx: gosub measure
sertxd("OnGoSub",#Pass,"=")
gosub start
on Pass gosub dummy,dummy,dummy ; Test instruction (or fall thro')
gosub measure
next Pass
sertxd(cr,lf,"Null Check=") ; Check consistency
gosub start
gosub measure
pause 9000
next Seed
dummy:
gosub measure
sertxd("Ret=")
gosub start
ret:
return
start:
b0 = Seed ; Seed value for maths calculations
pokesfr TMR1L,Seed ; Set Timer Lo byte
pokesfr TMR1H,0 ; Zero Timer Hi byte
return
measure:
peeksfr TMR1H,WVLo ; Read Timer Hi byte
peeksfr TMR1L,TMBV ; Read Timer Low byte
peeksfr TMR1H,WVHi ; Read Timer Hi byte again
report:
if TMBV < 128 then inc WVLo endif
WVHL = WVLo + WVHi / 2 ; Average Timer High byte
WVHi = WVLo
WVLo = TMBV ; Timer Low byte
WVHL = WVHL - Nul - Seed ; Subtract Nul and sanity check bytes
if WVHL > 65000 then ; Report slightly negative value
WVHL = 65535 - WVHL
sertxd("-")
endif
sertxd (#WVHL," ") ; Report the time delay
return
Here are some typical output results:
Code:
Seed=32
Null=4531 Swap=2833 b0=b0:581 b0=b0+b0:715 b0=b0+b0+b0:1006
Gosub+Ret=3247 Gosub(only)=958 Ret=2223
IFtrue=1273 IFfalse=832 Readtable=665
b0*0=862 b0/0=996 OnGoTo0=1085 OnGoSub0=2878 Ret=2252
b0*1=862 b0/1=1086 OnGoTo1=1307 OnGoSub1=3100 Ret=2252
b0*2=862 b0/2=1086 OnGoTo2=1562 OnGoSub2=3347 Ret=2252
b0*3=862 b0/3=1080 OnGoTo3=1151 OnGoSub3=1965
Null Check=3
A feature of this method is that it need take only a few minutes for anyone to measure any particular instuction(s), requiring only the connection of a target PICaxe to the PE. So I have not (yet) prepared a full table of typical exectution times, Also, I am not sure what are now the most useful "typical" instruction times to record. The M2 PICaxes seem to have suffered a significant increase in the execution time of their "jump" instructions, typically two or even three times longer than previous versions. So differentiating between the execution times with different values of a constant, seems to be rather insignificant compared for example with the "true" (~1250us) and "false" (~825us) paths for an IF. Also, the exact structure of formulae seems rather significant, for example "b0 = 0 + b0" executes far slower than "b0 = b0 + 0".
Here is a summary of a few of the "newer" instructions and some of my more "unexpected" results:
The new (not x8M2s) READTABLE command seems usefully fast at ~660us. The time executing an ON..GOTO depends on the position of the label in the list (~220us for each additional element), so the Basic interpreter appears to be searching through a list rather than using a vector into an "address offset" table. Thus it can be more efficient to omit the last label in an ON..GO.. command and allow the code to fall through for the final value. A GOSUB (alone) requires ~1000us with the corresponding RETURN a surprising ~2200us.
SWAP is obviously a pseudo-instruction since it is rather slow (~2800us) and uses around 15 bytes of code. Similarly ON..GOSUB appears to be the PE combining ON..GOTO, GOSUBs and GOTOs. SELECT..CASE is another rather slow preudo command which may be better replaced by an ON..GOTO where applicable. LOOKUP and LOOKDOWN appear to take ~2200us each for 8 elements and ~4200us for 16 elements (e.g. an ASCII-Hex lookup), regardless of the position of the searched element in the list.
Cheers, Alan.