picaxe optimisations

QuIcK

Senior Member
Heya, I have a reather lengthy program. its running fine, but im wandering if i can tweak it so it is more responsive, so optimising for speed here.

first question (and most important): Does a 'smaller' program (ie bytes) mean it will run faster. or is it dependant on the commands used. or is it dependant on the number of lines?

for example:
(some background on the code. the constants (regctrl, reginputs, regoutputs) are variables in the full program. this segment of the program takes the 'selected input' (0-3 dec), 'selected output' (0-2 dec), 'selected preset' (0-3 dec), and regctrls (3 toggle type buttons, eg %111. %101 etc), and puts them into a 16-bit register for outputing to an led driver.)
CODE 1:
Code:
symbol lamps = w0
symbol temp = w1
symbol temp2 = w2
symbol reginputs = 1
symbol regoutputs = 2
symbol regpreset = 0
symbol regctrl = %010
temp = 1 << reginputs
temp2 = 1 << regoutputs * 2
temp2 = temp2 << 7
temp = temp | temp2
temp2 = 1 << regpreset * 2
temp2 = temp2 << 3
temp = temp | temp2
temp2 = regctrl << 13
temp = temp | temp2
lamps = not temp
CODE 2:
Code:
symbol lamps = w0
symbol temp = w1
symbol reginputs = 1
symbol regoutputs = 2
symbol regpreset = 0
symbol regctrl = %010
lamps = $FFFF
clearbit lamps, reginputs
temp = regoutputs + 8
clearbit lamps, temp
temp = regpreset + 4
clearbit lamps, temp
temp = regctrl << 13
temp = not temp
lamps = lamps & temp
now, code sample 1 compiles to 49 bytes, but has more lines of code.
code sample 2 compiles to 55 bytes, but is shorter.

which is more efficient?

any other optimisation tips? commands to avoid etc. Hippie, i've read your website regarding this, and implemented it as much as possible
 

boriz

Senior Member
It’s far from straight forward. My suggestion is that you time them and see. Maybe use a second ‘timer’ Picaxe running a PULSIN and SERTXD program.
 

QuIcK

Senior Member
hmm, yeh i guess thats the way to go. I wish I knew/was any good at Assembly.
mayb just toggle a pin, and use a scope to measue it.
or get the VSM, when they release x2 parts...
 

graynomad

Senior Member
Does a 'smaller' program (ie bytes) mean it will run faster
It's usually the other way around. You make a program faster by "unrolling" the loops, putting code inline rather that calling sub routines etc. All this makes the code larger.

For example
Code:
do gosub something while x < 5
is neat but has overheads like testing the value of x and jumping back to the start of the loop.

Code:
gosub something
gosub something
gosub something
gosub something
gosub something
is uglier and larger but way faster. But this still has overheads in the call and return so

Code:
all the "something" code
all the "something" code
all the "something" code
all the "something" code
all the "something" code
is even uglier and even larger but faster again.
 

QuIcK

Senior Member
hmm, i was more concerned with specific commands like togglebit or setbit clearbit, inc, dec, << etc.
surely, as these are specific commands, the compiler knows what its about and optimises around these commands (as it were)

eg. 'setbit b0, 0' would be more efficient than 'b0 = b0 OR %1', as with toggle & XOR, clear & AND. even tho they achieve the same thing.
 

hippy

Ex-Staff (retired)
Staff member
SETBIT and the like are generally less efficient ( code space and speed ) than doing an explicit operation involving numbers. That's because they are 'macro commands' designed to make the program more writeable, readable and understandable rather than efficient.

SetBit b0, x

becomes ...

b0 = DCD x | b0

So 'SetBit b0,0' becomes 'b0 = DCD 0 | b0", less efficient than 'b0 = 1 | b0' and in turn less efficient than 'b0 = b0 | 1".

I haven't had the time to work through your examples, but what you want could possibly be done most efficiently on one line as a traditional LET command with shifts and or's. Perhaps post details of what the 16-bit output contains and where blocks of bits are derived from. The way you are doing it seems long winded and convoluted in both cases.
 

manuka

Senior Member
Heya, I have a reather lengthy program. its running fine, but im wandering if i can tweak it so it is more responsive, so optimising for speed here.
Aside from praiseworthy code optimisation approaches, a similar quest some time back with a 08M showed swampy responses can be most readily hurried along simply by doubling clock speed to 8MHz. Naturally time delay adjustments for PAUSE/WAIT etc had to be factored in. Stan.
 
Last edited:

hippy

Ex-Staff (retired)
Staff member
If I've understood the mapping correctly then this code should do what you want and can be adjusted to be correct if not. This uses around 32 bytes and will be faster ...

Code:
'       %ccc-------------  regctrl          3-bits
'       %-----ooo--------  regoutputs  0-2  3-bits
'       %--------pppp----  regpresets  0-3  4-bits
'       %------------iiii  reginputs   0-3  4-bits

lamps = regctrl           << 13
lamps = %0000000100000000 << regoutputs | lamps
lamps = %0000000000010000 << regpreset  | lamps
lamps = %0000000000000001 << reginputs  | lamps ^ $FFFF
 

QuIcK

Senior Member
wow.
thanks m8! hadnt even thought of xor
gonna blast through the rest of it in a more experimental mood, and see what i can make faster. there are 2 others that are run every execution. if I struggle, then i'll give you a shout
edit:

actually. how about this code?
its to read an 8x2 key matrix.
keyaddress is a demultiplexer chip (eg %000 is pin0, %010 is pin2 etc)
keyport is assigned as: %000rraaa where rr are the 2 returns from the matrix, and aaa are the address of the demux. its an active low return, so key presses register as 0, open keys are 1

Code:
keys = $FFFF
for keyaddress = 0 to 7
 'address the matrix driver
 keyport = keyaddress
 'filter out the returns
 temp = keyport & %11000
 'temp = temp rev 8
 'shift down
 temp = temp >> 3
 'if a key has been pressed
 'shift them into place (twice as there is 2 of em)
 temp = temp << keyaddress
 temp = temp << keyaddress
 
 'add into key register 
 keys = keys XNOR temp
next keyaddress
expected output is a 1 where the key is pressed.
unfortunately, i've already fab'd a pcb for this, so pin assignments need to be taken as they are.
I cant figure out a clever way for going from pin3&4 straight into the key register. especially as an active low, when the rest of the port can't be relied on for a constant state.
 
Last edited:

QuIcK

Senior Member
also, i use setbit, clearbit and togglebit quite a lot.
are these more or less efficient than their boolean maths brothers? eg AND, OR , XOR?
or is it more worth concentrating on the structure of my program than the commands actually used?
 

hippy

Ex-Staff (retired)
Staff member
They can be less efficient when using bit numbers rather than variables ( see post #6 ) but they are only slightly less efficient.

My recommendation is to concentrate on structure and simplicity of code as that makes it easier to write and understand ( for yourself and others ), only worry about optimisation towards the end of the development or where it becomes necessary. The important thing is getting code written, debugged and working then you can worry about speed and efficiency later on. The fastest code in the world is no good if it doesn't do the job :)

As for the key scanning, I'd consider ...

keys = 0
for keyaddress = 0 to 7
keyport = keyaddress
keys = keyport >> 3 & %11 ^ %11 << keyaddress | keys
next keyaddress
 

QuIcK

Senior Member
thanks hippy for all your help. its working a lot faster now! i dont think i would have worked it out anywhere as elegantly as that.

out of curiosity, why do you assign the variable you are working on back into itself last (eg b0 = .... OR b0, as opposed to b0 = b0 OR ....).

is this because the pic does the maths in a register, not affecting the value of b0 (say) until the whole line is finished, meaning you can 'work' on the variable you want to shift/process into place first?


background on the project: I work in a recording studio, and we are building a passive Control Room Monitor Controller (for speakers).

The passive volume control is a relay based resistor ladder (6bit, 63 step) with additional relays for mono sum (of a stereo signal), 4 stereo inputs, and 3 stereo outputs.
It came as a pcb from Igor (on prodigy-diy forums), and he had his own programmed front end for it. However, we had a lovely spare part from an old digital broadcast desk, with lovely chunky, clicky, illuminated buttons. As it was from a digital desk, it already had some interfacing circuits to go from buttons->pic->lamps, which is where some of the odd assignments come from. it also comes from the layout of the buttons we have decided on, making the most used buttons (like mute, and presets) easy to access.

Also, in order to get a pcb made (simply & cheaply), all the io between the interface and the pic need to come up on 1 port.
 

MartinM57

Moderator
...because keys is being built up as the loop progresses - just like your code does...and you have to keep OR'ing in the last value or else it will only end up with the value from when the loop counter is 7.

Another interesting question is what 'keyport = keyaddress' does - you have it in your code, and Hippy code has it as well, but it seems to achieve nothing (subsequent lines can just use 'keyaddress'). I suspect there is some code missing?
 

QuIcK

Senior Member
keyport is a symbol for pinsb
addresses 0-7 are on pins b0-b2. normally i'd filter, just to make sure there is no erratic behaviour on the other output pins. but as it comes directly after a loop assignment, and its 0-7, it doesnt need overprocessing.
there is no code missing, just the declarations of
dirsb = %11100111
symbol keyport = pinsb

I understand why its OR'd back in. I was just wandering if the maths that comes before it (on the line) are processed into the variable, or if they are held in a register, until the line is complete, and then put into the variable at the end.

by the looks of things, thats how it works. i just wanted to confirm. learning moment for me :)
 
Last edited:

hippy

Ex-Staff (retired)
Staff member
Calculations are held in a separate internal 16-bit register and written to the destination variable at the end of calculation.
 

westaust55

Moderator
Think of it this way, if an intermediate value was put back into the variable as the calculation on a single command line progressed, we could not use a program like

b0 = 10 * b2 + b1 * 10 + b0

otherwise b0 would be modified before we reached the last part of the forumla (ie + b0)
whioch would not be very convenient.
 
Top