Calculating values for PauseUS on M2 chips

pleitch · Aug 12, 2012

Hi all.

Previously I had just managed to get NEC IR codes working on an 18M2 using the Pause command. Although this worked it wasn't as reliable as I wanted. So I switched to the PauseUS function, however this failed to work.

I don't have any means of finding out why this wasn't working other than testing the output times using other PicAxes, but I could have purchased software to connect to my mic input on my computer or purchased hardware for this.

I found that the PauseUS time, when overclocked to 32 mHz, wasn't pausing at the times I was expecting.

I found that there's a very rough rule of thumb is as follows:
Get the time that you want to pause by, divide by 1.25 then minus 176. For instance, if you want to pause for 500 μs then:
Value = 500 / 1.25 - 176
= 400 - 176
= 224

This means that no value below 176 are not possible, however, from what I've found the values aren't reliable below about 200 μs.

I've attached a document with some analysis. I've structured it a bit like a scientific report but please don't think that this is in any way scientific. I didn't do "significance" tests, confidence intervals or anything actually scientific. I structured this way so I could repeat it later and so that it would still make sense when I re-read it in the future. However, the graphs are interesting.

Using the rule of thumb, then testing and adjusting through interpolation I got values very close to the NEC standard and the remote control signal now works reliably every time.

hippy · Aug 12, 2012

Having done it a few times, I wouldn't want to try and develop bit-banged IR output code without a logic analyser to allow pause times to be measured accurately and set correctly.

The main challenge is that a PAUSEUS command pauses but also has its own execution time which adds to the overall time taken. With execution times of other commands to take into account it can be hard to tell exactly what pause value to use. The best approach I have found is to choose a number and adjust up or down as required, repeat until happy. This can get complicated if you are unlucky or don't get the order of setting time values right as adjusting one time can shift the code's timing and throw other pause timings out.

Without a logic analyser the best approach is to develop at default operating speed with times set so once speed is increased they will deliver what is required. For example, if you want a bit time of 500us at 32MHz, then develop for a bit time of 4000us at 4MHz.

When not knowing the specific bit timings required, but trying to match what a remote puts out, always remember to measure what appears on the IR receiver, not the output signal.

westaust55 · Aug 13, 2012

With respect to the minimum time for the PAUSEUS command being 200 used this is understandable.
With the tokenised/interpreted BASIC system, each command has to be fetched, interpreted and performed.
The "perform"part requires branching to the correct part of firmware then setting up the variable/value for duration, etc.
On average a BASIC command has an overhead duration of the order of 250 usec at 4 MHz clock speed.
I did some work ( and incorporated info by hippy et al) for program space, optimisation and command timing which was previously posted if you care to search for that thread.

pleitch · Aug 13, 2012

Thanks - this is exactly what I was demonstrating.

I don't have a logic analyser so I set up IR in and measured the remote then I was attempting to match via another picaxe. Using the rule of thumb I came up with is a VERY quick way of getting within about 30 μs of the target, then use interpolation to find the exact value (while still expecting a range of variation).

Phyrephish · Aug 19, 2012

Thanks pleitch, I was considering doing this myself but now there is no need to re-invent the wheel, thanks again. I have been using hippy's approach to this problem but without a logic analyser per se. It's good to check visually what is actually output by the IR emitter and I do this by 'recording the IR output' as sound into a sound recording program and then using the sound editing software to zoom in on the waveform and verify the timing duration of the IR pulses. Adjusting the timing as required. Problem is I program in the house and the recording computer is the recording studio a 25 m walk away. Now I can use your equation and reduce my daily excercise

Robotrixer · Aug 20, 2012

Hello,

last month I wrote some timecritical programs (stopwatch with fast interrupt routines, IR- remote Control for "Hexbug").
I had problems with the picaxe command execute times.

So I solved this problem with a little Picaxe-M2 code. This code gives me the execution time from a command or command blocks back.

It works on 4MHz and 16MHZ. The result is shown in the picaxe terminal.

Have fun!

Werner

hippy · Aug 20, 2012

When it comes to instruction execution timings it is worth reading past discussions of the subject on the forum. It's not as simple as 'measure it' as how things are measured will give a range of values and there is no definitive single answer

AllyCat · Aug 20, 2012

Hi,

I've been meauring execution times (with some success) by setting/reading Timer1 before and after various instructions (not a trivial task but I hope to document the method soon). The results are quite "consistent", but even the slightest change in an unrelated part of the test program can alter the delays by at least tens of instruction cycles (presumably because of the tokenised bitwise coding used in PICaxe Basic).

For Pauseus I typically measure 629 instruction cycles for Pauseus 0, 719 cycles for Pauseus 1 and then another 10us for each increment. Instruction cycles equate to microseconds with a 4MHz clock and obviously 125ns with a 32MHz clock.

Cheers, Alan.

hippy · Aug 20, 2012

AllyCat said:
The results are quite "consistent", but even the slightest change in an unrelated part of the test program can alter the delays by at least tens of instruction cycles (presumably because of the tokenised bitwise coding used in PICaxe Basic).

That's usually the cause; a change somewhere can alter the timing of what's being measured and even the timing of the code doing the measurement ( the 'constant overhead' that's subtracted ). It's consistent while static, and averaging and multiple loops of execution allows better calculation of the time, but doesn't give a more accurate value for a specific time.

The best tip is to put the code being tested at the start of the program in a subroutine, then changes in the later part of the program doesn't alter what you have ...

Code:

Goto Main

CodeToTest:
  :
  Return

Main:
  Start Timing
  Gosub CodeToTest
  Stop Timing
  Report Result

pleitch · Aug 25, 2012

It would be nice if the crew at Revolution Education could measure the time delays for their functions and publish them. I realise this is getting very "cutting edge" - very advanced - but it would be helpful.

Buzby · Aug 25, 2012

pleitch said:
It would be nice if the crew at Revolution Education could measure the time delays for their functions and publish them. I realise this is getting very "cutting edge" - very advanced - but it would be helpful.

It's been covered many times in these forums, a search will soon lead you to the discussions.

The bottom line is - there is no fixed execution time for a PICAXE instruction.

Sometimes the same instruction will take XuS, and othertimes YuS, and occasionally ZuS.

westaust55 · Aug 26, 2012

pleitch said:
It would be nice if the crew at Revolution Education could measure the time delays for their functions and publish them. I realise this is getting very "cutting edge" - very advanced - but it would be helpful.

If you have not already done so, then have a read through the document I created and published in post 9 in this thread.
http://www.picaxeforum.co.uk/showthread.php?17782-PICAXE-program-Size-Optimisations-and-Speed

Then think about the fact that unlike early computers like the Commodore 64 et al, each PICAXE command, variable, constant, etc doe not take an exact byte to store for many PICAXE chips.
So in addition to the usual "fetch the next byte and determine the BASIC command" whoich would have a relatively constant time to execute (varying slightly by time to required to scan through a command look up table),
in the case of the PICAXE due to the greater compression used for the tokenised BASIC program, the PICAXE interpreter may need to fetch just one byte or may need to fetch two bytes from the users BASIC program and extract bits from the relevant byte(s) to in effect construct the command token before the command can be executed.
As a result, and as indicated by Buzby above, the position of a command token (and any following parameters or variables) takes a slightly variable time to locate and extract just to get to the point of starting the execution of the command depending upon where within a byte or bytes in the program area the command token commences.

AllyCat · Aug 27, 2012

Hi westaust,

Thanks for that link (and all the work that the original post must have required) which I seemed to have missed before. Perhaps it should be a "sticky". The data might be slightly out of date now, I believe that the 18M2 was the first of the M2 series, but my recent measurements on 20M2 seem very similar.

Whilst the "average" execution times in M2 devices appears to be about 30% slower, I find the considerably longer times for "jump" instructions to be disturbing, about twice as long for a simple GOTO and almost three times longer for a GOSUB/RETURN, compared with earlier chips. Thus a GOSUB/RETURN is almost one thousand times slower than PIC assembler, and it's not even as if GOSUB supports local variables, which would need to be stored and recovered.

However, my particular dislike is the "pseudo instructions" which are not documented as such, for example SWAP and SELECT .. CASE. SWAP for example consumes about 16 bytes of code space and executes slower than the equivalent LET or POKE/PEEK equivalents (except possibly for word variables). I was amazed in another thread that a Case..Select was taking over 12,000 instruction cycles because it was simply executing a long series of IF .. THENs.

It can be argued that these constructs are to help "novice" users, but they add to the number of commands which (potentially) need to be learned/understood and IMHO it's not very constructive to offer instructions that are less efficent than the "basic" set.

Incidentally, I've now devised a method of measuring execution times based on reading Timer1. It has advantages of requiring only one PICaxe chip (so can directly report each instruction being measured) and any variation of clock frequency is ignored (because instruction cycles are read directly). However, I haven't yet decided whether I should give detals in a Blog, a Code Snippet thread, a new post in the Active Forum, or maybe add it to the existing thread?

Cheers, Alan.

westaust55 · Aug 28, 2012

@AllyCat/Alan,
I believe that additional information on the same topic/concept would do well to be added to the same thread.
That keeps it all together when the time arises (as it certainly does) to point others to information useful to their "current" quest/project.

hippy · Aug 28, 2012

AllyCat said:
However, my particular dislike is the "pseudo instructions" which are not documented as such, for example SWAP and SELECT .. CASE. SWAP for example consumes about 16 bytes of code space and executes slower than the equivalent LET or POKE/PEEK equivalents (except possibly for word variables).

SWAP on an M2 is indeed longer than the equivalent LET sequence and takes longer to execute, however the LET sequence would not be an exact equivalent. The SWAP is an 'atomic operation' while the LET sequence is not which can have an impact in multi-tasking programs.

AllyCat said:
It can be argued that these constructs are to help "novice" users, but they add to the number of commands which (potentially) need to be learned/understood and IMHO it's not very constructive to offer instructions that are less efficent than the "basic" set.

The primary purpose of an extended command set is to allow easier program coding and to provide structures which match a program's design and/or the algorithms and processes behind that.

In terms of program formulation; execution efficiency plays second fiddle to the ease of design and representation. Execution efficiency may well be unimportant in the result, and often is. It is not, to me, a convincing argument that commands should be excluded simply for being potentially less efficient.

Often, as with SELECT CASE, the code is no less efficient than its equivalent IF-THEN-ELSE, just easier to write and use. That doesn't mean SELECT CASE or its IF-THEN-ELSE equivalent is the most efficient for a particular program and in some cases different command choices may be.

Ultimately it's about giving a program designer the choice of ways to do things, to facilitate design as they desire it, with optimisation then following if the outcome does not match with any execution requirement. An experienced programmer may bear in mind execution efficiency while coding and incorporate that in their design but most will prefer to code how they want to, in the way that is most natural for them, and deal with efficiency issues later if they do arise.

AllyCat · Sep 6, 2012

Hi,

@Westaust: Thanks, I have now added my contribution to your thread linked above.

@ Hippy: Yes, I was rather being a Devil's Advocate. But the issue with the CASE..SELECT construct (in another thread) was that the relative novice had used it in a fast interrupt routine, where an ON..GOTO was significantly better. But he had no real way of knowing; it wasn't me who wrote earlier in this thread (but I concur) :

pleitch said:
It would be nice if the crew at Revolution Education could measure the time delays for their functions and publish them. ....

But back to the original topic, PAUSEUS does indeed increase the delay by 10us for each increment of its parameter above unity, but there is a considerable static "overhead" of ~700us.

For (somewhat) shorter delays one can use the execution times of a few "No Operation" (NOP) instructions. There are several around 400us, the most obvious being b0 = b0 and perhaps surprisingly RANDOM b0 (not strictly a NOP as it modifies a register). The fastest commands seem to be output/input (e.g. Low 0), also not a true NOP, but one can probably be chosen to have no effect in each specific application.

For slightly longer delays (500us+) there is PAUSE 0 or b0 = b0 + 0 and then PAUSEUS 0 at around 630us. All these examples apply to M2s with a 4MHz clock.

Cheers, Alan.

jlhooper · Sep 6, 2012

Pleitch
FWIW I've have been generating NEC code using pulsout followed by pauseus.
Using a digital scope I was able to see the durations, and this is what I've found.
NB. There is no carrier here.
Using 18m2 @ 16Mhz.

pulsout pin,225 '560us on
pauseus 110 '560us off
This generates a "0".

pulsout pin,225 '560us on
pauseus 540 '1.66ms off
This generates a "1".

The following are the other timings needed
pulsout pin,3600 '9ms on
pause 385 '96.2s off
pause 160 '40.5s off
pauseus 760 '2.25s off

The address for NEC if it helps is 0010000100111100
jeff

Calculating values for PauseUS on M2 chips

pleitch

New Member

Attachments

hippy

Ex-Staff (retired)

westaust55

Moderator

pleitch

New Member

Phyrephish

New Member

Robotrixer

New Member

Attachments

hippy

Ex-Staff (retired)

AllyCat

Senior Member

hippy

Ex-Staff (retired)

pleitch

New Member

Buzby

Senior Member

westaust55

Moderator

AllyCat

Senior Member

westaust55

Moderator

hippy

Ex-Staff (retired)

AllyCat

Senior Member

jlhooper

New Member