Cheap I2C OLED display with 08M2

hippy · Mar 9, 2020

mrm said:
Having invested some time in making a small daughterboard for the oled with an 08m2 smd picaxe and other components so that everything fits into a small window in the corner of a switch box it would be nice to have a smooth screen redraw although in reality waiting for a fraction of a second as part of the display is updated is not critical.

There are two parts to refresh rates; the updating of any bitmap on the PICAXE and blasting that bitmap out to the display.

With the limited RAM memory of an 08M2 I would imagine that you are at best building a line in memory and then blasting that line to the display, repeating for the rest. That means there will be a vertical delay in refreshing lines which depends on how quickly you can get the character bitmapped into memory, and a horizontal delay as each bitmap of line is blasted out.

If you can post your code then members may see opportunities to increase the speed and performance of each.

mrm · Mar 9, 2020

WhiteSpace said:
My understanding from having read the various threads on these SSD1306 OLEDs repeatedly over the last few weeks is that the RAM that they use can’t be read from, so if I understand that and AllyCat’s suggestion correctly, using the right hand 8 x 8 as extra storage wouldn’t work. I hesitate a little to chip in on a discussion between two experts, though! I’ve been working over the past couple of weeks on using one of these displays to show 3 or 4 separate pieces of information, each capable of being refreshed separately without needing to rewrite the entire 128 x 64. I haven’t quite finished, but I’ll post an update this evening in case that’s any use in this discussion. Essentially it’s just a question of specifying start column and row for the bit of the display that you want to write to, then inc’ing and dec’ing to build the characters. Doing so doesn’t wipe the rest of the display.

For the ssd1306 It is possible to set the start position to avoid rewriting the screen eg. 0x21 startrow endrow followed by 0x22 startcol endcol. create a subroutine to do this and then call it after setting startrow and/or startcol.

WhiteSpace · Mar 9, 2020

@mrm just in case it's any use, here is an extract from the code that I have been working on. The project is a remote control for a vehicle, with two pots on the transmitter sending motor control values via IROUT as byte values. The receiver converts them to PWM values, and also checks for motor current (via sense resistors) and battery voltage (via voltage divider), which it displays on the OLED. It's all entirely unnecessary, of course - I could just have an LED that flashes on overcurrent or low voltage, but the OLED display looks very smart (I attach a photo).

This is an intermediate version - I've had a number of versions since then. It displays separately 3 word values (left current, right current and voltage) in different locations on the OLED. The location is set by the following command:

row = y
col = x
Gosub SetPosition

The display maintains pixels until they are overwritten (so it's necessary to clear the display if you're not entirely overwriting a particular position).

I'm running this on a 28X2. I'm just in the process of splitting off the display onto a second Picaxe, because loading the display takes time and slows down the important process of sending controls to the motors promptly. I've made things more difficult by insisting on large characters, which take more bytes per character and therefore longer to load.

Rich (BB code):

DisplayRCurrentValue: ; displays the current value calculated from the R sense resistor. 

      row = 0
      col = 68
      gosub SetPosition; this starts this part of the display in the top row, half way across
            
      let b15 = "R"; for the right current display
            Gosub DisplayCharacter

      row = 2 ;now start the rest of the display two rows down
      col = 68
      gosub SetPosition ; All SetPosition commands for the start of a
                        ' set of characters are placed in the main code, not in the DisplayCharacter subroutine, so that the characters can be
                        ' placed at the desired location on the screen
      
      If RightCurrent > 999 then
            ;let b11 = RightCurrent DIG 0 + "0" (DIG 0 not used); this splits LeftCurrent w11 into 3 digits for display, ignoring the 3rd decimal place,
                  ;and adds a decimal place and a zero.  Not sure why the + "0" is necessary,but it works
            let b12 = RightCurrent DIG 1 + "0"
            let b13 = RightCurrent DIG 2 + "0"
            let b14 = RightCurrent DIG 3 + "0"
            
            let b16 = 0

            do
            lookup b16, (b14, ".", b13, b12, "A"), b15 ;this takes the 3 digits of and feeds them in succession to b15
            ;which points to the right place in ram
      
            gosub DisplayCharacter
            inc b16
            loop while b16 <= 4
      
      Else
            ;let b11 = RightCurrent DIG 0 + "0" ; this splits LeftCurrent w11 into 3 digits for display, ign. 
                  ;Not sure why the + "0" is necessary,but it works
            let b12 = RightCurrent DIG 1 + "0"
            let b13 = RightCurrent DIG 2 + "0"
            let b14 = RightCurrent DIG 3 + "0"
            
            let b16 = 0

            do
            lookup b16, ("0", ".", b13, b12, "A"), b15 ;this takes the 2 digits after the decimal place and feeds them in succession to b15
            ;which points to the right place in ram
      
            gosub DisplayCharacter
            inc b16
            loop while b16 <= 4
      
Endif

Return

The DisplayCharacter subroutine then calls the necessary character bytes from scratchpad and sends them to the display.

mrm · Mar 9, 2020

hippy said:
There are two parts to refresh rates; the updating of any bitmap on the PICAXE and blasting that bitmap out to the display.

With the limited RAM memory of an 08M2 I would imagine that you are at best building a line in memory and then blasting that line to the display, repeating for the rest. That means there will be a vertical delay in refreshing lines which depends on how quickly you can get the character bitmapped into memory, and a horizontal delay as each bitmap of line is blasted out.

If you can post your code then members may see opportunities to increase the speed and performance of each.

Thanks. With the 08m2 there is not much memory available to do much more than one line. With the 14m2 there are more possibilities but I do not have the code for that to hand. It makes no sense that pixels for a graphic appear instantly whereas pixels plotted from a font lookup but displayed as a bitmap take longer to appear. It is either a trick of the eye or some othee artefact. The displays also retain their charge which can persist for several minutes even with the display disconnected from power so it might also be some type of memory effect in the gddram or oled panel.

These little oleds are a great addition to a project, highly legible and save a huge amount of space so it is a shame that a prebuilt serial module is not available.

mrm · Mar 9, 2020

WhiteSpace said:
@mrm just in case it's any use, here is an extract from the code that I have been working on. The project is a remote control for a vehicle, with two pots on the transmitter sending motor control values via IROUT as byte values. The receiver converts them to PWM values, and also checks for motor current (via sense resistors) and battery voltage (via voltage divider), which it displays on the OLED. It's all entirely unnecessary, of course - I could just have an LED that flashes on overcurrent or low voltage, but the OLED display looks very smart (I attach a photo).

This is an intermediate version - I've had a number of versions since then. It displays separately 3 word values (left current, right current and voltage) in different locations on the OLED. The location is set by the following command:

row = y
col = x
Gosub SetPosition

The display maintains pixels until they are overwritten (so it's necessary to clear the display if you're not entirely overwriting a particular position).

I'm running this on a 28X2. I'm just in the process of splitting off the display onto a second Picaxe, because loading the display takes time and slows down the important process of sending controls to the motors promptly. I've made things more difficult by insisting on large characters, which take more bytes per character and therefore longer to load.

Rich (BB code):

DisplayRCurrentValue: ; displays the current value calculated from the R sense resistor. row = 0 col = 68 gosub SetPosition; this starts this part of the display in the top row, half way across let b15 = "R"; for the right current display Gosub DisplayCharacter row = 2 ;now start the rest of the display two rows down col = 68 gosub SetPosition ; All SetPosition commands for the start of a ' set of characters are placed in the main code, not in the DisplayCharacter subroutine, so that the characters can be ' placed at the desired location on the screen If RightCurrent > 999 then ;let b11 = RightCurrent DIG 0 + "0" (DIG 0 not used); this splits LeftCurrent w11 into 3 digits for display, ignoring the 3rd decimal place, ;and adds a decimal place and a zero. Not sure why the + "0" is necessary,but it works let b12 = RightCurrent DIG 1 + "0" let b13 = RightCurrent DIG 2 + "0" let b14 = RightCurrent DIG 3 + "0" let b16 = 0 do lookup b16, (b14, ".", b13, b12, "A"), b15 ;this takes the 3 digits of and feeds them in succession to b15 ;which points to the right place in ram gosub DisplayCharacter inc b16 loop while b16 <= 4 Else ;let b11 = RightCurrent DIG 0 + "0" ; this splits LeftCurrent w11 into 3 digits for display, ign. ;Not sure why the + "0" is necessary,but it works let b12 = RightCurrent DIG 1 + "0" let b13 = RightCurrent DIG 2 + "0" let b14 = RightCurrent DIG 3 + "0" let b16 = 0 do lookup b16, ("0", ".", b13, b12, "A"), b15 ;this takes the 2 digits after the decimal place and feeds them in succession to b15 ;which points to the right place in ram gosub DisplayCharacter inc b16 loop while b16 <= 4 Endif Return

The DisplayCharacter subroutine then calls the necessary character bytes from scratchpad and sends them to the display.

View attachment 23660

Cheers. Please disregard my earlier post as setposition will do just nicely.

I discovered it on the forum the other day after having reinvented the wheel from reams of C code and the Solomon Sytems data sheet.

hippy · Mar 10, 2020

mrm said:
It makes no sense that pixels for a graphic appear instantly whereas pixels plotted from a font lookup but displayed as a bitmap take longer to appear. It is either a trick of the eye or some othee artefact.

No, it doesn't make any sense. A set of bytes sent at a set rate should appear in the same time no matter whether from a graphic or other bitmap; a bitmap is a bitmap no matter how it is created. The display wouldn't know what it is receiving, and has no means to control the speed of data being sent to it.

The conclusion has to be that the rates are therefore different. Either more bytes are being sent or the time taken to obtain and send each byte takes longer. Maybe both.

AllyCat · Mar 10, 2020

Hi,

mrm said:
These little oleds are a great addition to a project, highly legible and save a huge amount of space so it is a shame that a prebuilt serial module is not available.

Yes, that's exactly what I had in mind; or at least it would be easy enough to fit an 08M2 and the 4-pin I2C display onto a scrap of Veroboard / stripboard in my usual way.

I believe an HSERIN input, interrupt-driving a circular buffer can be good for concatonated bursts up to 19200 baud, regardless of the display code speed. Around a 50 byte circular buffer can still leave 48 bytes for User-Defined Characters (8 x 6 pixels) and all normal variables in an 08M2. However, I would want to offer at least a normal 96 ASCII (printable) character set, ideally leaving the EEPROM for other purposes. Of course there is no TABLE memory in an 08M2, so the character font has to be defined in Program memory:

As discussed earlier in this thread, a two level ON ... GOTO .. tree (much faster than SELECT ... CASE) can jump to (96) individual "character subroutines" (a GOTO returnaddress is faster, but would use about 150 more bytes than RETURN, and is less pleasing). A LOOKUP CharLo (Pixel bytes list) PixByte : RETURN needs about 11 bytes per character but would be very slow and I was horrified that a PEEK (for a 5 byte list) uses about 32 bytes of program memory!

. I think that's because the embedded PE Macro code has to INCrement the address pointer between each byte and then restore it at the end.

An HI2COUT .... byte list : RETURN (again re-invented from earlier in this thread) seems the only alternative; it should be fast and uses only about 11 bytes per character. But the reason that I wanted to read back from the display RAM is to support also double / treble-sized characters and/or an inverted (B/W) cursor, etc.. Thus, one would read back the pixels from the few of these "special" character(s) sent initially to the display, then process the relevant cell block(s) and re-send in the same way as UDCs.

A possible workaround (to not being able to read back the display memory) is to add another I2C slave to receive the pixels! An 8-pin I2C RAM seems a little pricey (except in bulk from China), since we only need to buffer about 6 pixel bytes (but the remainder could be useful for something else) and an EEPROM chip might "wear out" (or perhaps the bytes can be spread over a sufficiently larger memory area). So I'm considering the addition of one of the small clock chips which includes some bytes of "NV" RAM; it might be useful / required anyway, or perhaps not bother to fit a battery / crystal to just use the RAM.

Cheers, Alan.

hippy · Mar 10, 2020

AllyCat said:
mrm said:

These little oleds are a great addition to a project, highly legible and save a huge amount of space so it is a shame that a prebuilt serial module is not available.

Click to expand...

Yes, that's exactly what I had in mind; or at least it would be easy enough to fit an 08M2 and the 4-pin I2C display onto a scrap of Veroboard / stripboard in my usual way.

And that's likely one reason why there is no pre-built module available; it would be cheaper for people to build their own or they can use some other PICAXE board. There's no real need for a purpose-built board so would we sell any if they were available ?

And then there's the question as to what it should use. I would favour a 28X2 but that would be considered too costly by some. The issue is that anything less than a 28X2 creates compromises, which will be acceptable to some but not others. And once one gets down to an 08M2 there are more serious compromises involved.

That's not to say that an 08M2 solution isn't valid, just that it wouldn't make sense as a de facto SSD1306 driver product. If one wanted that it would probably be easy enough to use an AXE201 08M2 Surface Mount board. Or veroboard and strip-board.

The more one drops down towards 08M2 the more issues and compromises there are; speed of data transfer, display update rates, character bitmap storage, and functionality and enhancements.

A 20X2 seems a good middle ground but with only 128 bytes of Scratchpad for high-speed serial in that wouldn't allow a full 8 line by 21 character update from one big SEROUT.

The 28X2 is more ideal, with a 1024 Scratch pad which should be able to buffer any sensible amount of data sent to it, to facilitate high-speed data rates, or to hold a full 128x64 bitmap to speed up whole screen update rates.

At the end of the day it seems to come down to driver software rather than not having a pre-built module to do it. I would tend to favour 28X2 with Scratchpad used for incoming data buffering, update the display on the fly, with some tricks included to delay updates so they can be burst sent when data has been received. I think that would be a good balance. That would also work with the 20X2 with the sender having to ensure it doesn't overrun the buffer and that could be made easier for the sender.

That would create a framework for the M2's as well, though serial input buffering and character bitmap handling would need to be updated.

So I would say a 28X2 reference design is the place to start. I'll have a think about that.

WhiteSpace · Mar 10, 2020

The 28X2 was more or less where I had got to for the two OLED displays on my project - particularly with the larger fonts. I’m sure that others would be able to condense my inefficient code quite significantly, but having the 1024 scratchpad bytes and 4 slots means that I don’t need to worry too much. I rejected the 20X2 because of the smaller scratchpad. There’s obviously a size penalty, though.

mrm · Mar 10, 2020

AllyCat said:
Hi,

Yes, that's exactly what I had in mind; or at least it would be easy enough to fit an 08M2 and the 4-pin I2C display onto a scrap of Veroboard / stripboard in my usual way.

I believe an HSERIN input, interrupt-driving a circular buffer can be good for concatonated bursts up to 19200 baud, regardless of the display code speed. Around a 50 byte circular buffer can still leave 48 bytes for User-Defined Characters (8 x 6 pixels) and all normal variables in an 08M2. However, I would want to offer at least a normal 96 ASCII (printable) character set, ideally leaving the EEPROM for other purposes. Of course there is no TABLE memory in an 08M2, so the character font has to be defined in Program memory:

As discussed earlier in this thread, a two level ON ... GOTO .. tree (much faster than SELECT ... CASE) can jump to (96) individual "character subroutines" (a GOTO returnaddress is faster, but would use about 150 more bytes than RETURN, and is less pleasing). A LOOKUP CharLo (Pixel bytes list) PixByte : RETURN needs about 11 bytes per character but would be very slow and I was horrified that a PEEK (for a 5 byte list) uses about 32 bytes of program memory! . I think that's because the embedded PE Macro code has to INCrement the address pointer between each byte and then restore it at the end.

An HI2COUT .... byte list : RETURN (again re-invented from earlier in this thread) seems the only alternative; it should be fast and uses only about 11 bytes per character. But the reason that I wanted to read back from the display RAM is to support also double / treble-sized characters and/or an inverted (B/W) cursor, etc.. Thus, one would read back the pixels from the few of these "special" character(s) sent initially to the display, then process the relevant cell block(s) and re-send in the same way as UDCs.

A possible workaround (to not being able to read back the display memory) is to add another I2C slave to receive the pixels! An 8-pin I2C RAM seems a little pricey (except in bulk from China), since we only need to buffer about 6 pixel bytes (but the remainder could be useful for something else) and an EEPROM chip might "wear out" (or perhaps the bytes can be spread over a sufficiently larger memory area). So I'm considering the addition of one of the small clock chips which includes some bytes of "NV" RAM; it might be useful / required anyway, or perhaps not bother to fit a battery / crystal to just use the RAM.

Cheers, Alan.

This link might help with double sized characters: Big Text for Little Display. Complete heresy but a very interesting web site for good ideas.

Using the timing routine from your recent post, drawing a one line message from eeprom takes 49000 somethings (cycles or micro seconds?) at 16mhz compared to 55000 from a LOOKUP although these timings are still to be checked using Hippy's 20x2 timing program and dubious given the visual appearance although this might be an artefact or perhaps a defect in the display.

hippy · Mar 10, 2020

mrm said:
This link might help with double sized characters: Big Text for Little Display. Complete heresy but a very interesting web site for good ideas.

Interesting algorithm and it does seem to work -

Code:

Test:
  w0 = %1101001
  Gosub ShowB0
  Gosub DoubleB0ToW0
  SerTxd( "-> " )
  Gosub ShowW0
  End

DoubleB0toW0:
  w0 = b0 * 16 | b0 & $0F0F
  w0 = w0 *  4 | w0 & $3333
  w0 = w0 *  2 | w0 & $5555
  w0 = w0 *  2 | w0
  Return

ShowW0:
  SerTxd( #bit15, #bit14, #bit13, #bit12, " " )
  SerTxd( #bit11, #bit10, #bit9,  #bit8,  " " )
ShowB0:
  SerTxd( #bit7,  #bit6,  #bit5,  #bit4,  " " )
  SerTxd( #bit3,  #bit2,  #bit1,  #bit0,  " " )
  Return

I have always done it the brute force way. Not sure which is quicker, though the above will be faster on an X2 and using shifts rather than multiplications. Brute force is limited to using w0 and w1 -

Code:

DoubleB0toW0:
  bit15 = bit7 : bit14 = bit7
  bit13 = bit6 : bit12 = bit6
  bit11 = bit5 : bit10 = bit5
  bit9  = bit4 : bit8  = bit4
  bit7  = bit3 : bit6  = bit3
  bit5  = bit2 : bit4  = bit2
  bit3  = bit1 : bit2  = bit1
  bit1  = bit0
  Return

AllyCat · Mar 10, 2020

Hi,

Yes, apart from some background in the development of Teletext (was that in the 70s or the 80s?) which basically used 7 x 5 fonts (and a "character rounding" algorithm to fill 12 x 16 "double height" or interlaced video cells), I'm well aware that also in PICaxe terms I am mainly re-inventing the wheel. See for example post #20 from this thread in 2008 .

IIRC my timing program at 16 MHz should deliver the execution time in either us (i.e. PIC Instruction Cycles / 4) or ms (if a decimal point is shown), depending on the selected mode. At 4 MHz it reads directly in us and Instruction Cycles (same number) up to 60,000.

Cheers, Alan.

hippy · Mar 10, 2020

mrm said:
Using the timing routine from your recent post, drawing a one line message from eeprom takes 49000 somethings (cycles or micro seconds?) at 16mhz compared to 55000 from a LOOKUP although these timings are still to be checked using Hippy's 20x2 timing program and dubious given the visual appearance although this might be an artefact or perhaps a defect in the display.

Could be about right, 50 ms or thereabouts.

I did some timings on a 28X2 and, at 32 MHz using I2CFAST, one can send out one line of characters in 7 ms. So that's the maximum rate one is ever going to achieve when building the bitmap for a line to send from a character buffer. That's using fixed characters, no lookup.

Taking each character from a line buffer, expanding that to a bitmap to send initially took 42.3 ms. With some optimisation that's now down to 36.2 ms. So the lookup overhead is about 1.4 ms per character. That's using READ and READTABLE. That's about 270us per font column of lookup.

I can't see things getting more optimised than that so 36 ms per line works out at about 300 ms per screen. That compares with around 50 ms for blatting a full screen bitmap out of memory, but of course doesn't include setting that bitmap.

By building each line bitmap up one can blat that out in 7ms, but each line would still have a 32 ms gap between them while building each line of bitmap, so full screen refresh isn't any better, is in fact worse. And I'm not sure how useful that would be in an OLED driver taking serial in to drive the display.

Of course, double the speed to 64MHz and that halves everything; 150 ms per full screen update.

mrm · Mar 10, 2020

Thanks for all this data and the picaxe basic translation of the doubling routine. Far more elegant than my attempt.

Based on the calculated timings a picaxe is never going to be a great driver for a full strength serial terminal but they are still a useful combination for displaying simple data in a compact package.

WhiteSpace · Mar 11, 2020

If it’s any use to support the results of the calculations above, I ran a couple of fairly unscientific speed tests last night. I used the OLED display on my 28X2 transmitter Picaxe. It displays two sets of bars to indicate the position of left and right motor controls (showing full ahead L and half ahead R in this photo).

The bars are built by loading an entire 6 rows of the display for 24 columns each, plus the index line in the centre. So they involve loading 6 x 48 plus a few, out of the 8 x 128 blocks - a little over a quarter of the screen area. I tested the speed by getting the display to cycle both sets of bars through the entire 52 positions of the motor controls from full reverse to full ahead. I can post the code later if it’s any help. Including a couple of lines of SerTXD to check what was happening, and the IROUT transmitter commands (which transmit each motor control 5 times with a 21ms gap between each transmission and a 5ms gap between L and R groups), the 52 steps in the loop took just about 30 seconds, so around 580ms per programme cycle. When I REM’d out the IROUT, the 52 iterations took less than 8 seconds, so about 150ms essentially to read two variables and plug them into the display code and load (a quarter of) the OLED.

So it’s definitely the IROUT that takes the time with this display - the OLED code works quite quickly. Because it’s possible to point the display to a specific row/column, the key to speed seems to be not to waste time loading the entire screen each time but only to drop individual characters in where needed. Obviously whether that works will depend on what you want to display.

I’ll do a similar test on the receiver Picaxe (which is the one displaying current and voltage shown above) this evening. I’ve just bought a resonator for it so i will try it at 64MHz.

I certainly find the bars respond as quickly as I can turn the pots - there’s no obvious lag. But it has taken quite a lot of tweaking to get here. Others with more advanced coding skills will no doubt be able to improve on what I have done to shave off further ms. I hope this is useful.

hippy · Mar 11, 2020

AllyCat said:
I believe an HSERIN input, interrupt-driving a circular buffer can be good for concatonated bursts up to 19200 baud

I tried that on an 18M2 and couldn't get it to work at 4800 when interrupt driven. A polled HSERIN looking for a terminator worked at 4800 but not at 9600. Using SERIN was the same.

That's reading an HSEROUT from a 28X2 so pretty much back-to-back data, a SEROUT might deliver larger inter-byte gaps.

I believe there are some tricks using PEEKSFR which offer an improvement over HSERIN so if someone can remind me what they are I can try that.

lbenson · Mar 11, 2020

hippy said:
I believe there are some tricks using PEEKSFR which offer an improvement over HSERIN so if someone can remind me what they are I can try that.

From my notes: "AllyCat/hippy's fast serial in on M2s"

Code:

Symbol PIR1        = $11  ; $011

symbol xb0             = b0 ' SFR bits
Symbol TMR1IF_BIT  = bit0
Symbol TMR2IF_BIT  = bit1
Symbol CCP1IF_BIT  = bit2
Symbol SSPIF_BIT   = bit3
Symbol TXIF_BIT    = bit4
Symbol RCIF_BIT    = bit5
Symbol ADIF_BIT    = bit6
Symbol TMR1GIF_BIT = bit7

SetFreq M32
HSerSetup B9600_32, %1000
bptr=28 ' first byte of non-register ram
Do
  Do
    PeekSfr PIR1, b0
  Loop Until RCIF_BIT = 1
  PeekSfr RCREG, @bPtr
  ' look at @bptr and/or increment bptr
Loop

Just about to see if I can use this to replace ptr in an M2 web server. Worked fine for me on a 14M2 to the degree that I tested it.

mrm · Mar 11, 2020

Could it be this thread?

Is Hserin really this slow?

I’ve been trying to use Hserin in a project where quite long sequences of characters are being received and it would be difficult to process the characters and not miss any arriving characters - but I’ve been very disappointed with Hserin. To demonstrate my problem, I’ve set up this test...

picaxeforum.co.uk

see post #20

but perhaps the register byte positions are different for other chips as it does not seem to work.

AllyCat · Mar 11, 2020

Hi,

hippy said:
Interesting algorithm and it does seem to work -
....
I have always done it the brute force way. Not sure which is quicker, though the above will be faster on an X2 and using shifts rather than multiplications. Brute force is limited to using w0 and w1 -

Code:

DoubleB0toW0: bit15 = bit7 : bit14 = bit7 bit13 = bit6 : bit12 = bit6 bit11 = bit5 : bit10 = bit5 bit9 = bit4 : bit8 = bit4 bit7 = bit3 : bit6 = bit3 bit5 = bit2 : bit4 = bit2 bit3 = bit1 : bit2 = bit1 bit1 = bit0 Return

My Brute Force way was to use an EEPROM lookup table (of 16 bytes), one of the reasons why I didn't want to clutter up my EEPROM with a Font table. Since the PICaxe instructions are broadly similar, the number of program bytes created gives an indication of the how quickly the algorithm will execute: The EEPROM code uses about 15 program bytes, the maths algorithm 30 and the "bit-twiddles" 35. Of course with an 08M2, the EEPROM bytes are recovered from the Program memory, so the size is very similar, but a real measurement does still show improved speed. I haven't included validation that the algorithms all work correctly, but they do all produce the same value in W0 (15555).

Code:

#rem      TERMINAL OUTPUT:
Mode= 1   Overhead= +3573 us 
Test1:Maths Algorithm= +5775 us
Test2:EEPROM Table= +3365 us
Test3 Direct BITS= +7077 us
Check= +5 us
PRODUCED BY CODE SNIPPET :
#endrem
; Program to Estimate the Execution speed of most M2 PICaxe Instructions
; AllyCat, 2012, Updated August 2019
#picaxe 08m2       ; Or any other M2 but *CANNOT USE THE SIMULATOR*
#terminal 4800     ; 4800 baud at setfreq m4 (*Also X2s require SFR changes*)
; *Select MODE here*    ; MODE 1 = Number of PIC Instruction cycles (= time in us)
symbol MODE = 1         ; MODE 2 = Measure times with Setfreq M16 (up to 64,000 us)
                        ; MODE 3 = Calculate 4MHz execution times up to 255.99ms
symbol TMR1L     = $16  ; Timer 1 Low byte SFR address
symbol TMR1H     = $17  ; Timer 1 High byte SFR address
symbol T1CON     = $18  ; Control Register: Prescaler = %--nn---- (PS = 2^nn)
symbol tempb    = b23   ; or s_w4 Temporary byte for MODE3, etc.
symbol exec     = w12   ; or s_w5 16-bit execution cycles or time
symbol nul      = w13   ; or s_w6 Overhead for Call/Return,etc.with no instructions
main:
do 
 nul = 0
  sertxd(cr,lf,"Mode= ",#MODE)
  sertxd("   Overhead=")
  gosub start                          ; Start the NUL measurement
  gosub measure                        ; Shows and Returns timer value in us or ms
  nul = exec                           ; Store for subsequent calculations
    gosub runtests            ;* Update any of the target code as required *
  sertxd(cr,lf,"Check=")               ; Re-measure the NUL (or another) value
  gosub start
  gosub measure
  if exec > 50 then
    sertxd(" ***")                    ; Mark inaccurate nul result
  endif
  sertxd(cr,lf)
  pause 5000
loop
  end
runtests:
  sertxd(cr,lf,"Test1:Maths Algorithm=")   ; Report the second test, etc.
     w0 = %01101001 
  gosub start
  w0 = b0 * 16 | b0 & $0F0F
  w0 = w0 *  4 | w0 & $3333
  w0 = w0 *  2 | w0 & $5555
  w0 = w0 *  2 | w0         ; = 30 bytes
  gosub measure
  sertxd(cr,lf,"Test2:EEPROM Table=")
w0 = %01101001
symbol base = 10
data base,(0,3,$0C,$0F,$30,$33,$3C,$3F,$C0,$C3,$CC,$CF,$F0,$F3,$FC,$FF)
  gosub start
    b1 = b0 ** 4096 + base       ; Maybe faster than / 16
    b0 = b0 and 15 + base
    read b0,b0
    read b1,b1            ;  = 14 bytes
  gosub measure
  sertxd(cr,lf,"Test3 Direct BITS=")
w0 = %01101001
  gosub start
  bit15 = bit7 : bit14 = bit7
  bit13 = bit6 : bit12 = bit6
  bit11 = bit5 : bit10 = bit5
  bit9  = bit4 : bit8  = bit4
  bit7  = bit3 : bit6  = bit3
  bit5  = bit2 : bit4  = bit2
  bit3  = bit1 : bit2  = bit1
  bit1  = bit0            ;  = 35 bytes
  gosub measure  
  return
; REQUIRED SUBROUTINES
start:          ; Setup the Mode and Reset Timer 0
  tempb = MODE                       ; At least one variable is needed in IF.THEN
  if tempb > 1 then
    setfreq m16                      ; For modes 2 and 3
    pause 100                        ;* 25 ms to allow 20 ms timer to update *
  endif
  peeksfr T1CON,tempb
  pokesfr T1CON,0                    ; Stop the timer
  pokesfr TMR1H,0                    ; Clear High byte
  pokesfr TMR1L,0   
  pokesfr T1CON,tempb                ; Restore the timer (1 / 33 at 4 / 16 MHz)
  return
measure:       ; Reads Timer 0, calculates and displays timed delay 
  pokesfr T1CON,0                    ; Stop the timer to read both bytes
  peeksfr TMR1L,tempb                ; Read Low byte (S_W variables only words) 
  peeksfr TMR1H,exec                 ; Read High byte
  pokesfr T1CON,1                    ; Restart timer (can ignore frequency)
  exec = exec * 256 + tempb          ; Allows use of S_W variable
  exec = exec - nul                 ; Estimate execution time of target code
  setfreq m4                         ; Ensure serial comms are at default again
  if exec > 64000 then               ; Correctly display negative values (near zero)
    sertxd(" -")
    exec = - exec
  else
    sertxd(" +")
  endif
  tempb = MODE
  if tempb < 3 then                  ; Check the MODE
    sertxd (#exec," us ")    
  else                               ; Display in ms to 2 decimal places 
    tempb = exec ** 26214 / 100      ; *4/10 = 26214 / 65536 then /100
    sertxd(#tempb,".")               ; Units of ms
    tempb = exec ** 26214 // 100 / 10
    sertxd(#tempb)                   ; Tenths of ms
    tempb = exec  ** 26214 // 10
    sertxd(#tempb," ms")             ; Hundredths of ms
  endif
  return

As can be seen, the Maths algorithm is about 70% slower and the Bit operations 110% slower (in an M2). I may deal with other issues raised above in subsequent comments, in another post.

Cheers, Alan.

hippy · Mar 11, 2020

lbenson said:
From my notes: "AllyCat/hippy's fast serial in on M2s"

mrm said:
Could it be this thread?

Thanks both. And I can confirm a polling PEEKSFR loop filling a buffer and ending on a terminator works at B19200_32. Not at 38400 though, but "them's silly baud rates".

I'll have a look to see if I can run it in an interrupt at 19200. But that's likely to be tomorrow.

So that does make an M2 more feasible for an SSD1306 etc driver. Which is handy because that's what's on our Grove Interface Board and I know where my 96x96 OLED is for that !

An 08M2 should also be feasible. It's biggest issue being that it doesn't have Table for the font lookup.

So, in theory, "one driver to rule them all", should be possible, configured with just #DEFINE options.

AllyCat · Mar 11, 2020

Hi,

Yes, I'm reasonably confident that an (M2) interrupt-driven buffer is possible at 19,200 baud / 32 MHz, although my particular interest is more in 9600_16 (because it halves the PICaxe power consumption) or even lower at 4800_8 and the default 2400_4. The main problem is that the maximum interrupt latency is not defined / specified for the PICaxe, although some commands (in the main program) are obviously not acceptable; for example the "blocking" or fast timing commands such as PAUSEUS , PULSIN / OUT and SEROUT etc.. The program structure would be much as I outlined in the linked thread above, but I've largely abandoned the attempt to "work around" the PEEKSFR,.... , @BPTRINC bug, it's just too much a Can of Worms.

With an 08M2, the use of Program memory for the font lookup is almost mandatory, but it does appear to have some unexpected benefits. Obviously it can be fast (because the data is directly within the instruction) but also the provision of proportional (variable-width) characters can be "automatic". Personally I'm not a fan of variable character widths for "technical" information, but another advantage is that the number zero (i.e. all unlit pixels) is very efficiently coded in the PICaxe interpreter. This means that the inter-character gap, narrow characters and punctuation, etc. carry very little overhead, for example the <space> character codes into just 3 program bytes (give or take the "noise" introduced by the byte-boundary quandary).

It looks as if reading ASCII characters from a circular buffer and then transmitting their images via HI2C can be handled quite quickly: The execution time of the ON .... GOTO instruction increases with its length (number of labels), so a three-stage decoding tree (with branches of 8 + 4 + 4 for 128 characters) looked marginally faster than 8 + 16 branches. However, the calculation of the additional offset (e.g. char // 16 / 4) takes a significant time, so on average 16 + 8 branches looks optimum, particularly where some of the 16 (e.g. "control" or User Definable character groups) won't be used much (if at all).

Finally, my reminiscence of Teletext "character rounding" above made me Google it. Very soon found this link with lots of detail and font data, so I'll just quote very briefly (it's a 5 x 9 font because it provides for 2 lines of "descenders") :

Code:

 * The character-smoothing algorithm of the SAA5050 and friends is
 * a fairly simple means of expanding a 5x9 pixel character to 10 x 18
 * pixels for use on an interlaced display.  All it does is to detect
 *  2x2 clumps of pixels containing a diagonal line and add a couple of
 * subpixels to it, like this:
 *
 * . #  -> . . # # -> . . # # or # . -> # # . . -> # # . .
 * # .     . . # #    . # # #    . #    # # . .    # # # .
 *         # # . .    # # # .           . . # #    . # # #
 *         # # . .    # # . .           . . # #    . . # #

Of course in those days (yes it was the 70s - 80s) the "algorithm" was performed by customised (on-chip) logic, but it would be interesting to see if it can be coded efficiently for a PICaxe.

Cheers, Alan.

hippy · Mar 11, 2020

AllyCat said:
I've largely abandoned the attempt to "work around" the PEEKSFR,.... , @BPTRINC bug, it's just too much a Can of Worms.

That's the writing $00 then $xx issue I am guessing. I used this -

PeekSfr RCREG, b0 : @bPtrInc = b0

Code:

Symbol TIMEOUT_COUNT = 100  ; <-- Guessed

ReceiveBuffer:
  bPtr = BUFFER_START
  Low BUSY
  Do
    PeekSfr PIR1, b0
  Loop Until RCIF_BIT = 1
  Do
    If RCIF_BIT = 1 Then
      PeekSfr RCREG, b0 : @bPtrInc = b0
      timeout = TIMEOUT_COUNT
    Else
      timeout = timeout - 1
    End If
    PeekSfr PIR1, b0
  Loop Until timeout = 0
  High BUSY
  Return

AllyCat · Mar 12, 2020

Hi,

Yes, I should have called it the "Double-Incrementing bug", that PEEKSFR combined with @BPTRINC does a Pre-Increment and a Post-Increment of BPTR. Indeed, the "free" INC combined with @BPTR executes a little faster than a separate INC (i.e. BPTR = BPTR + 1). However, since I was using a circular buffer, I decided that it was "safer" to test for a wraparound before a separate INC, thus shortening the critical (longest) path when reloading the pointer. But I managed to recover the "lost" time by positioning the INC immediately before the read of the buffer flag, making more efficient use of the "wraparound" conditional test.

Also, I test the "OverRun" flag whenever a character byte is not available in the buffer. Since the OverRun flag blocks all further reception* of bytes permanently, it does need to be tested and cleared (only if necessary), but obviously the flag must be clear if characters are (still) being received. Finally, because the PICaxe RETURN (+ interrupt latency and re-entry) is relatively slow, I test for incoming serial bits (using the Interrupt on Change flag for the serial input pin) not just for a completed byte (RCIF flag) immediately before exiting the interrupt routine.. This effectively increases the buffer timing from two characters to almost three.

* If the OverRun flag does become set, then all the received data might need to be junked (since there will be at least one missing/bad character). However, it may be worth noting that there are still two "valid" characters that can be recovered from the serial input buffer before the bad one(s).

Cheers, Alan.

Cheap I2C OLED display with 08M2

Technical Support

Well-known member

Well-known member

Attachments

Well-known member

Well-known member

Technical Support

Senior Member

Technical Support

Well-known member

Well-known member

Technical Support

Senior Member

Technical Support

Well-known member

Well-known member

Technical Support

Senior Member

Well-known member

Senior Member

Technical Support

Senior Member

Technical Support

Senior Member