"Character Rounding" for Double-Sized characters on bitmapped displays such as SSD1306

AllyCat · Mar 22, 2020

Hi,

PICaxe is not well-suited to displaying characters on bitmapped displays because it is relatively slow and has limited memory storage for the character fonts. However, the tiny OLED displays which use the SSD1306 have an appeal because they are small and can run on 3.3. volts with low power. The display is organised in rows of pixels one byte (8 bits) high, so it is most convenient to use a font 8 bits high by 5 bits (plus a space bit) wide. These characters are rather small so an obvious solution is to double their height and/or width by duplicating pixels vertically and/or horizontally, thus avoiding the need for a larger memory.

Doubling the width is very easy, by just transmitting each byte to the display twice, but double-height is more difficult because the bytes must be increased to words with the bits interleaved within themselves. Numerous methods, some quite sophisticated, have been discussed but, at least for a PICaxe, a simple lookup table method appears to be the most efficient, as shown in post #139 of this long thread. For quadruple-height characters, the "mathematical" method needs 32-bit words and the simple "bit-swapping" would take twice as long, so it seems the benefits of the lookup method actually increase.

Some of the doubled characters can look rather "blocky", but there is another defect when using diagonal lines that are (effectively) only one pixel wide. Consider a square of 100 unit-sized pixels : A horizontal or vertical line contains 10 pixels and is 10 units long, whilst a diagonal also contains 10 pixels but is almost 15 units long (i.e. multiplied by the squatre root of 2), so it looks less bright (photos below). Horizontal and vertical lines 2 pixels wide would contain 20 pixels, but if the width of diagonals can be increased to 3 pixels (horizontally and/or vertically), they can contain 30 pixels, giving a more uniform brightness (along their greater length). Furthermore, if the "edge" pixels of the diagonal line are suitably staggered then the line has a smoother appearance. Such methods were developed for the Teletext (TV) character displays in the 1980s, but they used primarily a hardware method.

The principle of the Teletext "character rounding" (or diagonal smoothing) system can be explained by considering a square of 9 pixels (i.e. 3 x 3). The centre square should be set as a "1" if two opposite corners are set to "1" and the other two to "0". This occurs in two cases of the 16 possibilities (i.e. all permutations of the four corner pixels) which could be determined with a lookup table and also gives us a "clue" that the algorithm should implement a 1 in 8 selection. But collecting together the 4 pixel bits into a single nibble (to access the table) would be time-consuming and calculates only one pixel at a time. Ideally, we want to calculate a full byte (or even word) of individual bits in parallel,.

Testing for the equality of two diagonal corners seems the obvious method, but is not efficient; it needs an AND of the pixels (to detect two 1s), a NOR (for two 0s) and then an OR logic operation. This must be done for each diagonal and finally their inequality verified. Much better is to test for the the inequality of adjacent (horizontal or vertical) corner pixels, which can be done with a simple EXCLUSIVE-OR function. Furthermore, the two pairs of horizontally-separated pixels can be compared in a single XOR operation, then two vertical inequalities tested by simple shift and XOR operations, and finally an AND. Shifting (in an M2) can use division or multiplication, but multiplication (or adding to itself, if a single bit-shift is required) is slightly faster.

The algorithm coded below is for pixels (stored in Words) which have already been height-doubled. This implies that adjacent pairs of bits are identical, which may be less efficient, but it is easier to visualise the half- (source) pixel offset implicit in the rounding or smoothing technique. Note that pixels are also added to the source words, not just to the newly-calculated intermediate words. Ultimately it may be worthwhile to adapt the algorithm to use individual (not paired) bits within the words, i.e. before the vertical stretching process. This could be particularly beneficial for quadruple-height characters, but also gives the possibility of processing two pairs of source columns within a single word. In principle, this requires only two passes per character, because the outer verticals do not have pixels added. so only 4 internal bytes (for double-height) need to be calculated before stretching.

The program takes character bytes indirectly from the "high" area of RAM, restores the processed bytes to an adjacent area and shows the resulting character via SERTXD commands. It avoids the use of @BPTR {INC} because this may be dedicated to the interrupt routine for a character-receiving buffer. Note that some care is needed in the design of the font: The algorithm (and similar ones) has a "flaw" that a hollow diamond (i.e. lit pixels to the N, S, E and W of an unlit pixel) becomes filled at its centre.

Code:

; Character Rounding algorithm for double-height/width column-scanned 5x7 characters (12 x 16 cell)
; This version assumes height is already doubled (i.e. word data). AllyCat March 2020.
#picaxe 08M2
#no_data

symbol CHARST = 100                                ; First (blank) column of character
symbol CHAREND = 110                                ; Last (blank) column of character
    call loadpix                                        ; Load the source character pixels into upper (indirect) RAM
symbol source = b19                                ; Source pointer
symbol dest = b20                                    ; Destination pointer
    dest = 80                                        ; Set first Destination 

    for source = CHARST to CHAREND step 2    ; 6 columns of Word values
        peek source, b2,b3,b4,b5                 ; Get two adjacent input columns

        w0 = w1 xor w2                                ; W1 and W2 contain source pixels of adjacent columns
        w0 = w0 * 4 and w0                        ; Require both top and bottom corner squares to be different
        w0 = w2 * 4 xor w2 and w0 / 2         ; Require also upper and lower RHS corner pixels to be different
        w2 = w2 or w0                                ; Add extra pixel(s) to RH column
        w0 = w1 or w0                                ; Add extra pixel(s) to intermediate column (algorithm = 25 bytes)

        poke dest, b0,b1,b4,b5                    ; Write the middle and RHS columns to memory
        dest = dest + 4                            ; Advance pointer
        call show                                    ; Intermediate column in W0
        w0 = w2                                        ; Copy RHS column
        call show
    next                                                ; Next pair of columns
end

show:
    sertxd(cr)
    call show2            ; High byte
    b1 = b0                ; Load low bye and fall into show2
show2:
    sertxd(#bit15,#bit14,#bit13,#bit12,#bit11,#bit10,#bit9,#bit8)
return

loadpix:
w0 = 0                                 : poke 100 , word w0    ; Poke of a WORD constant is a syntax error
w0 = %11111111111111     : poke 102 , word w0
w0 = %00000011000000     : poke 104 , word w0             ; "K"
w0 = %00001100110000     : poke 106 , word w0
w0 = %00110000001100     : poke 108 , word w0
w0 = %11000000000011     : poke 110 , word w0
w0 = 0                                  : poke 112 , word w0
return

Cheers, Alan.

AllyCat · Jun 14, 2020

Hi,

I have now adapted the above program to drive a "real" SSD1306 OLED display, so an update to the ideas and decisions outlined above is appropriate.

The original program put "zero" (background) columns each side of the 5 source columns of data and then performed 6 iterations in a loop, which gave a "neat" structure. However, the zeros in the outer columns do not actually create any "new" data, and in practice only four iterations are required to create the new (interpolated) inner columns; thus the routine is a little inefficient. Also, it may be more convenient to tightly pack the character data within the memory, with no inter-character spaces (0). Therefore, the new version uses only 4 iterations, with small sections of "top and tail" program code (before and after the loop) to handle the outer columns (unchanged).

The PICaxe "Arithmetic Logic Unit" internal operations are just as efficient when using words (of 16 bits) as bytes. But memory operations are very much byte-oriented, so it is more efficient to import each data column as a single byte and then "stretch" it to a (Double-Height) word within the Rounding algorithm. However, the "vertical stretching" process is also time-consuming, so it is better to perform this before the data is expanded sideways from 5 columns up to 10. Although it appears twice in the listing (in the header and in the loop), a separate subroutine is NOT used, because it would save few, if any, program bytes and add significant timing delays.

Avoiding the use of the @BPTR variable, above, was an unrealistic constraint, because it is a very powerful tool for efficiently handling strings of pixels / bytes. Also, a number of different "character-pixel-handling" (sub)routines might be needed and the BPTR is a very convenient way to pass strings of data into or out of generalised routine(s). And "overlaying" the output bytes directly onto the input bytes (i.e. a Read - Modify - Write process) reduces the number of "pointers" that need to be incremented. However, in this particular application, it cannot be used because the new data bytes need to be interleaved between the original data bytes.

Code:

;  Character Rounding (Diagonal Interpolation) routine for 5 x 8 characters
;  AllyCat, June 2020.
#picaxe 08m2        ; And most others
data 0,(0,3,12,15,48,51,60,63,$C0,$C3,$CC,$CF,$F0,$F3,$FC,$FF)        ; For vertical stretching
symbol index = b0
symbol tempb = b1
symbol tempw = w1              ; Interpolated column
symbol lobyte = b2                ; Of tempw
symbol hibyte = b3
symbol wx = w2                    ; LH column
symbol wxlo = b4
symbol wxhi = b5
symbol wy = w3                    ; RH column
symbol wylo = b6
symbol wyhi = b7
symbol source = b8
symbol dest = b9
symbol col = b10
symbol row = b11
symbol addr = w6                             ; Word required for external serial EEPROM address
symbol OLEDSAD = $78                        ; OLED Slave address

CharacterRounding:
    bptr = 100                                       ; Start of upper row
    source = 105                                    ; to 109 (inclusive)
    dest = 110                                        ; Start of 10 bytes in lower row
    peek source,wxlo                               ; Start "fetch" routine
    wxhi = wxlo / 16                                ; Get the first column byte and expand to a word (in wx)
    wxlo = wxlo AND 15
    read wxhi,wxhi                                   ; Stretch        
    read wxlo,wxlo 
    @bptrinc = wxlo                                ; Upper row, first column (no rounding applicable)
    poke dest,wxhi                                   ; Lower row, first column
    inc dest  
    do                                                    ;  Executed 4 times
        inc source  
        peek source,wylo                            ; RH column
        wyhi = wylo / 16
        wylo = wylo AND 15
        read wylo,wylo
        read wyhi,wyhi 
        tempw = wx xor wy                                 ; Compare pixels of adjacent columns
        tempw = tempw * 4 and tempw           ; Require both top and bottom corner pixels to differ
        tempw = wx * 4 xor wx and tempw / 2     ; Require also upper and lower corner pixels to differ
        wx = wx or tempw                                   ; Add the extra pixels (in tempw) to left column
        poke dest,wxhi                                ; Lower row pixels
        inc dest
        @bptrinc = wxlo                            ; Upper row pixels
        tempw = tempw or wy                        ; Add right-hand column to extra pixels to create new column
        poke dest,hibyte
        inc dest 
        @bptrinc = lobyte 
        wx = wy                                         ; Copy RH column to LH column
    loop until source = 109                        ; Source pointer not post-incremented    
    @bptr = wxlo                                     ; Store the final column (unchanged)
    poke dest,wxhi

If one looks carefully, there are a few "imperfect" pixels, for example one pixel "missing" at the top of the "1" and "4", which I thought might be a bug in my algorithm. But they are also (not) there in the published font ("Bedstead") - it's known that the algorithm has a few flaws, such as filling-in the "small diamond" mentioned previously. I guess they're just more noticeable in the 20 characters on the little OLED, compared with the 960 characters of a standard Teletext TV screen. It's also worth remembering that the original font uses only 5 bytes per character and the Rounding routine adds the equivalent of less than one extra byte per character to the Program / EEPROM size.

Therefore, I devised a simple patch routine which might be useful if programming a larger EEPROM, or for a few High Quality characters. Normally, the patch might be configured as an overlay, but here it's arranged so that it can be applied "on the fly" (i.e. for any current character), which for example might be used for just Quad-Height digits on the OLED. The routine simply checks for one column (byte) number of the selected character and toggles the indicated pixel(s). For simplicity, the unspecified / zero entries implement a "No OPeration". This structure can be used to modify only one column per character, but could be duplicated for the few characters that might require further modification(s). So here is the patch, followed by typical code to send the completed data to an SSD1306 display.

Code:

; PATCH THE ROUNDED FONT :
    tempb = 0
    lookdown index,(0,"1","3","4"),tempb    ; Character number (ASCII)
    lookup tempb,(0,103,108,105),bptr        ; Column address (in RAM) upper or lower row 
    lookup tempb,(0,2,4,2),tempb                ; XOR pixels
    @bptr = @bptr xor tempb                     ; Toggle the pixels
; NOW DISPLAY THE CHARACTER ON OLED :
        hi2csetup i2cmaster,OLEDSAD,I2CSPEED,i2cbyte            ; For SSD1306 OLED display
        tempb = col + 11 : dest = row + 1
        hi2cout $0,($21,col,tempb, $22,row,dest)                          ; Set the character cell (12 x 16)
        col = col + 12 and 127                                                       ; Prepare for Next character
        bptr = 100  : addr = $40
        call sendrow12                                      ; Upper row then fall through for Lower row
sendrow12:
        hi2cout addr,(@bptrinc,@bptrinc,@bptrinc,@bptrinc,@bptrinc,_
            @bptrinc,@bptrinc,@bptrinc,@bptrinc,@bptrinc,0,0)  
return

Cheers, Alan.

"Character Rounding" for Double-Sized characters on bitmapped displays such as SSD1306

AllyCat

Senior Member

AllyCat

Senior Member