Speed difference of PICAXE BASIC vs native code

I've received an unusual Christmas present of a big bag of LEDs - my old school managed to order 1000 3mm ones when they wanted 5mm ones, and the teacher very kindly said I could have them! I thought I might have a go at building an LED cube like this one, but ideally would swap out the AVR microcontroller the author uses for a PICAXE since I'm familiar with the language and am set up to program them.

Rather than reinvent the wheel, I'd like to basically port the AVR C code provided in the instructable to BASIC. The software is built around an interrupt firing at regular intervals lighting successive 8x8 layers of the cube, at such a speed that persistence of vision makes it appear as if all the layers are lit simultaneously. The processor also calculates the animations on the fly, using a 2-dimensional array in memory as a buffer which is then read by the interrupt routine. This, along with the greater size of the cube would make this cube more computationally demanding than the ones others have posted on the forums here - the largest I found was a 5x5x5 cube, and the patterns were hardcoded in using lookup tables or similar.

So my worry is that the PICAXE might not be fast enough. The instructable says that, with the AVR clock at 14.7MHz, the interrupt fires ~10500 times per second meaning that all 8 layers are redrawn about 1300 times a second. Now if I assume that 25fps is still sufficient to get the persistence of vision effect, the interrupt rate required is 25*8=200 interrupts per second. If I ran the PICAXE at 64MHz and aimed for the 25fps refresh rate, the PICAXE could be slower than the AVR by a factor of 4 (difference in clockspeed)*52 (difference in interrupt count per second)=208 and still keep up.

[HR][/HR]
After all that, the question I'm asking is whether a 200-fold decrease in execution speed from AVR C to PICAXE BASIC is reasonable or whether the interpreted code overhead is more than this. I appreciate it's probably impossible to give a definite answer, but I'm just looking for some gut feelings before I go ahead and start designing circuits and ordering parts. Mine is that it might not be but I'm not ready to give up on the plucky PICAXE quite yet! ;-)

Cheers,

Cal
 

westaust55

Moderator
PICAXE speed is nominally around 250 usec per BASIC command at 4 MHz clock speed.
So at 64 MHz clock speed, that will reduce to about 16 usec per BASIC command.
I am other have posted some data on command overheads in the past.

Overall speed wise, it will depend on what your program is doing - how many commands to execute per cycle that will set the frames per second.

I do not think that using interrupts will be particularly useful.
You need to go as fast as possible and using interrupts occupies time to initiate a jump and return to an interrupt subroutine. What is the point of starting a new cycle if the previous one has not been completed?


There have been threads on LED cubes in the past. It may also be worth your search for and reading some of those if not already done.
 

AllyCat

Senior Member
Hi Cal,

If I ran the PICAXE at 64MHz and aimed for the 25fps refresh rate, the PICAXE could be slower than the AVR by a factor of 4 (difference in clockspeed)*52 (difference in interrupt count per second)=208 and still keep up.

...... After all that, the question I'm asking is whether a 200-fold decrease in execution speed from AVR C to PICAXE BASIC is reasonable or whether the interpreted code overhead is more than this.
Firstly, when a PIC runs at 64 MHz, it only executes 16 million OpCodes per second, but I don't know what the equivalent figure is for an AVR. My "rule of thumb" is that the PICaxe interpreter runs about 400 times slower that compiled code, but that depends on the type of instructions being executed, maths functions probably will be proportionally faster, and RETURNS (from subroutine or interrupt) appear to be remarkably slow.

Many of the details can be found in this thread with some of my updated figures for M2s (X2s are probably similar) in post #12.

Cheers, Alan.
 
I do not think that using interrupts will be particularly useful.
You need to go as fast as possible and using interrupts occupies time to initiate a jump and return to an interrupt subroutine. What is the point of starting a new cycle if the previous one has not been completed?
The reason for using an interrupt is to allow 'multitasking'. No matter what other processing is being done, the cube needs to be regularly updated or the persistence of vision effect will be lost. So the (potentially lengthy) calculating of the next frame needs to be regularly interrupted to draw the next eighth of the current frame. Basically, the approach is "I want the refresh rate to be this; fire an interrupt when required to make this happen" rather than "Draw everything and see what the refresh rate ends up as." This works for the AVR because even at 1300FPS the interrupts only use about 21% of CPU time; if I will be working the PICAXE flat out then it won't work.

Going on the 16uS per command estimate, I could have (1000000/16)/25=2500 BASIC commands per frame in total i.e. incorporating calculating and displaying that frame. I think my best bet may be to write some skeleton code to see if those figures turn out to be at all achievable.

RETURNS (from subroutine or interrupt) appear to be remarkably slow.
That is likely to be problematic - my instinct would be to write reusable procedures whenever possible which I could then call, almost like they were functions. The last time I used a PICAXE it was a 14M with only 256 bytes of program space to play with so I'm not used to having the luxury of lots of program space. Again, I think I need to write some code to see if I can afford to avoid subroutines.

Cal
 

hippy

Ex-Staff (retired)
Attempting to port demanding code from a processor with capacity to spare to one which has less to start with is probably not the best way to go. It would usually be better to design for the capacity one has.

A 5x5x5 cube updated at 20 fps is 2500 one LED at a time updates per second; 400us per LED.

If the hardware can be designed to allow 8-bit wide ( five LED's at a time ) control, that speeds things up five fold and gives more time per update; 2ms per five LED's.

It will definitely be easier to update a set of five LED's in 2ms and have time to spare than doing eight updates of single LED's in the same time. Additionally the LED's will be on for five times longer so can be quite a lot brighter.

If you can design the hardware so the 5x5x5 display is actually built as 8x8x2 you can do even better; that's just 16 updates per frame, 3ms per update at 20fps.
 

Goeytex

Senior Member
Again, I think I need to write some code to see if I can afford to avoid subroutines.
Yes, that is only way you will know if the Picaxe is up to the task.

As far command processing overhead goes the 16us figure is theoretical and will likely never be attained.
The time in question needs to include the time to jump from one BASIC command to the next as well as
the time to execute the Basic command.

For example with a Picaxe 20X2 operating at 64MHz it takes ~44.8us to execute "b0 = b0 + 1",
while it takes ~ 110us to execute "W0 = w0 / 2". Count on about 100us to jump to the interrupt
subroutine and begin executing the next command.
 
I'd seen that thread before but got the impression that the PICAXE was just telling another controller when to change animations etc. I went back and read through the blog properly and it now seems that the PICAXE is actually running the animations successfully which is encouraging.

A 5x5x5 cube updated at 20 fps is 2500 one LED at a time updates per second; 400us per LED.

If the hardware can be designed to allow 8-bit wide ( five LED's at a time ) control, that speeds things up five fold and gives more time per update; 2ms per five LED's.

It will definitely be easier to update a set of five LED's in 2ms and have time to spare than doing eight updates of single LED's in the same time. Additionally the LED's will be on for five times longer so can be quite a lot brighter.

If you can design the hardware so the 5x5x5 display is actually built as 8x8x2 you can do even better; that's just 16 updates per frame, 3ms per update at 20fps.
The design currently has everything operated byte-wide: one PICAXE output port is connected to the data input of eight shift registers, each controlling one row of LEDs. Another output port connects to eight transistors switching the common cathodes of one layer of LEDs to ground.
So each frame update would consist of clocking eight bytes (where each bit represents the state of an individual LED) out to the shift registers, while the appropriate cathode transistor is switched on to light the first layer. This is then repeated for the other seven layers.
I've attached a schematic which should make the hardware design clearer.

With this arrangement an 8^3 cube will be as fast as a 5^3 one as far as displaying row data is concerned. Only five bytes of data would need to be clocked out but three empty bytes would then need to be transmitted to clear old frame data from the shift registers. The smaller cube will still end up faster though because there are fewer layers to cycle through and the calculation of the animations will be shorter. Making a smaller cube (at least to start with) is definitely an option.
 

Attachments

Jamster

Senior Member
Just a thought, but I recall TFT displays use a transistor and capacitor by every subpixel to hold it on a little longer meaning it doesn't rely on POV quite as much and gives a 'nicer' display. Could that be used to prehaps reduce the refresh rate needed?
 

geoff07

Senior Member
Or why not use some form of latching driver register, and the picaxe just calculates the changes? Physically small, fewer connections to the picaxe, and they don't even need to flash unless you want to reduce power consumption.
 

AllyCat

Senior Member
Hi,

Don't overlook that we're discussing driving 512 LEDs here! 512 x Transistors + Capacitors, or Flip-Flops (Latches), with all their associated wiring, seems rather excessive "hardware assistance" to reduce the load on the PICaxe by a little.

the [AVR] interrupt fires ~10500 times per second meaning that all 8 layers are redrawn about 1300 times a second. Now if I assume that 25fps is still sufficient to get the persistence of vision effect, the interrupt rate required is 25*8=200 interrupts per second. If I ran the PICAXE at 64MHz and aimed for the 25fps refresh rate, the PICAXE could be slower than the AVR by a factor of 4 (difference in clockspeed)*52 (difference in interrupt count per second)=208 and still keep up.
Certainly 1300 updates per second is "overkill" but probably only by a factor of about 20. Don't confuse the "(Image) Update Rate" with the "Pixel Strobe (pulse) Rate". Perhaps the best example is that movie film normally runs at 24 "frames" per second (16 and 18 fps were quickly abandoned), but uses a three-bladed shutter which "chops" the light at 72 cycles per second to reduce the preceptible "flicker". Many Americans used to complain that the European TV system CRTs "flickered" (at 50 Hz) compared with their 60 Hz system, and now most monitors (even LCD, not just CRT or Plasma) "scan" at 70+ Hz, and many TVs at 100 Hz.

Personally, I've never tried (since IMHO "cubes" are a rather pointless excercise) but it looks as if a well-designed "linear" interrupt routine could execute in 1 or 2 ms with a 64 MHz clock. Basically just 8 x POKESFR port,@PTRINC : PULSOUT strobepin,1 and a little "housekeeping" might do the job. However, I rather doubt that a PICaxe-based program could "redraw" all those 512 pixels (LEDs) at a "worthwhile" rate. Even if the (human) eye doesn't see any "flicker", it may discern "waves" of motion (although these may either detract from or enhance the subjective effect).

Cheers, Alan.
 

Phil Robinson

New Member
Looking at it another way, you are only needing 64 bytes per cube refresh, at 25fps = 1600 loops = 625uS per loop.
Since for each of the 8 bit-planes one byte stays the same for 8 loops biggest issue will be which way to scan the bit planes to hide the slightly longer plane delay vs line delay.
From memory from 30 years ago at uni, brain discriminates vertical stripes better then horizontal ones so make the bit-planes horizontal.

Phil
 

BESQUEUT

Senior Member
The processor also calculates the animations on the fly, using a 2-dimensional array in memory as a buffer which is then read by the interrupt routine.
Depending of the animation, you may have to calculate pixel/pixel...
And 2 dimensional array and calculations are very time consumming with a Picaxe...

Firstly, when a PIC runs at 64 MHz, it only executes 16 million OpCodes per second, but I don't know what the equivalent figure is for an AVR.
For an ATMEL AVR, it is roughtly 1 Opcode/cycle :rolleyes:

All in one, I see only 2 issues :
- uses a BASIC compiler for ATMEL AVR (an XMEGA will be preferable...)
- uses a lot of PICAXEs, for example one per line of pixels + a master PICAXE (grid computing)
 
Last edited:

BESQUEUT

Senior Member
Or why not use some form of latching driver register, and the picaxe just calculates the changes? Physically small, fewer connections to the picaxe, and they don't even need to flash unless you want to reduce power consumption.
I did use 74HC594 for that. Work well without any flickerring with 8 of them serially connected.
 
Top