Profound 08M ?

Jeremy Leach

Senior Member
I wondered if an 08M could recite some Shakespeare (as you do), so here's a simple attempt ;) ...

Code:
#picaxe 08m

eeprom 0,("All the world's a stage,And all the men and women merely players,They have their exits and their entrances,And one man in his time plays many parts,His acts being seven ages.")


For b0 = 0 to 177
	Read b0,b1
	If b1 = "," Then
		SerTxd (CR,LF)
	Else	
		Sertxd(b1)
      Endif
      Pause 10
Next
I've only syntax checked it and it might be something that could be output to an LCD, perhaps a word at a time?

As it stands there are spare bytes to do 'something else' but not sure what ...perhaps accompanying sound?! I guess it really needs a dramatic YouTube clip ... :)

I was actually thinking about text compression and got side-tracked !
 

hippy

Ex-Staff (retired)
I was actually thinking about text compression and got side-tracked !
There are plenty of complex text compression techniques but most are not suited to the PICAXE's capabilities; what you gain in increased data storage is lost by increased decoding complexity which also means slower operation. Not so bad perhaps with a fast X1 or X2 and data held in I2C Eeprom but a problem with an 08M etc.

Therefore KISS is a good approach; simple encoding equals simple decoding and fast decode times.

One notable thing about characters is they are normally 7-bit stored in 8-bit bytes so that wastes a bit per byte ( 14% waste ). One can compact by using unused bits in one byte to hold bits from the next in sequence; that would be 8 characters in 7 bytes so the 174 bytes would become 153, 12% saved. Still quite complicated to do, and it turns out not to be optimal anyway, simpler gives better results.

Real text has a lot of spaces ( higher than 1 in 8 characters wasted ) so one can remove a space and set the msb of all characters which have a space preceding them; 145, 16% saved. Better and simpler.

Then there are macro substitutions, which you already do when you add CR/LF after each comma. Non-pintable characters can be used to indicate larger text and there are 32 of those ($00-$1F).

Using $00 as a line terminator, applying the 'msb means prefixed space' and macros $01=",CR/LF" and $02="he", that gives ...

Code:
#Picaxe 08M
#Terminal 4800

Eeprom( "All",$F4,$2,$F7,"orld's",$E1,$F3,"tage",$1    )
Eeprom( "And",$E1,"ll",$F4,$2,$ED,"en",$E1,"nd",$F7    )
Eeprom( "omen",$ED,"erely",$F0,"layers",$1,"T",$2,"y"  )
Eeprom( $E8,"ave",$F4,$2,"ir",$E5,"xits",$E1,"nd",$F4  )
Eeprom( $2,"ir",$E5,"ntrances",$1,"And",$EF,"ne",$ED   )
Eeprom( "an",$E9,"n",$E8,"is",$F4,"ime",$F0,"lays",$ED )
Eeprom( "any",$F0,"arts",$1,"His",$E1,"cts",$E2,"eing" )
Eeprom( $F3,"even",$E1,"ges.",$00                      )

Do
  Read b1,b0
  If bit7 = 1 Then
    SerTxd( " " )
    bit7 = 0
  End If
  If b0 <= $1F Then
    If b0 = 0 Then : End                   : End If
    If b0 = 1 Then : SerTxd( ",", CR, LF ) : End If
    If b0 = 2 Then : SerTxd( "he" )        : End If
  Else
    SerTxd( b0 )
  End If
  b1 = b1+1
  Pause 10
Loop
This works best where there are a number of macro substitution of long length and few spaces between words. Multiple spaces can be got round by using another macro with a number following indicating how many spaces.

It seems quite primitive but gives reasonable results and it's the technique I've used when I've had to compress text. A big advantage is that encoding the text isn't very complicated either.
 

Jeremy Leach

Senior Member
I thought you might rise to the bait Hippy ;)

I had similar thoughts, although what I tried as a concept in Excel (ie without and picaxe code) is :

I calculated the frequency distribution of characters a-z (and space) for this piece of text. See attached table, where I've sorted the characters on their occurence. This list of 27 characters is given a new set of character codes, and it's these codes that the picaxe would store to represent the text string.

There would need to a be a table of 27 bytes in EEPROM showing the mapping between the new character code and real character code.

Instead of storing the text string in bytes, it would be stored in nibbles. The first 15 (most frequent) characters in the table would only need one nibble. If a character above new code 14 was required then this would need two nibbles (a byte), so the first nibble would be value 15 and then the second nibble indicating the value above 15.

The picaxe code to do this might be quite simple - but this and the overhead of the table of 27 bytes would only make it effective for longer pieces of text.

Even so, this little bit of text can be compressed to around 70% of it's raw storage, taking the 27 byte table into account (but not the code).
 

Attachments

Last edited:

hippy

Ex-Staff (retired)
I think you're on you way to doing Huffman encoding. Keeping it to bytes or nibbles makes it easier for a PICAXE so that is sensible. It's a case of not making the code used to decode more than what you are saving.

The trick of using 6-bit characters ( A-Z, a-z, 0-9 ) and a 'shift-code' to switch between alphabets is how original Teletypes worked, so you're heading towards a dynamic version of that with 4-bit nibbles.

Another thing to bear in mind is that the decoder can be recursive with the macro text stored in Eeprom and may even contain macro calls themselves, that can simplify decoding. I tried it with the earlier code and that increased size but then there are few macros.

The next jump is to not think of letters but symbols, which can be one or more letters. Analyse them all and you find the most used 15 used. Of course the analysis of most common symbols and sub-sets of those ( eg "PICAXE-08M" could also be "{X}-08M" where {X}="PICAXE" ) gets quite complicated.

Using all these tricks in combination and you can get a very efficient 'macro text' system. I've used it for LCD ( particularly menus ) which need to generate particular code sequences. What's ultimately being done is creating an interpreter, whether for program code or data, it's much the same.

It's ultimately a matter of effort versus saving.
 
Last edited:

Jeremy Leach

Senior Member
Interesting. I also hadn't understood Huffman coding and the clever prefix-less codes ...so that's been interesting reading.

I've dabbled with macros for a picaxe menu system too. I don't really have an application for any of this at the moment, so it was just for interest really.
 

papaof2

Senior Member
Interpreters and text compression schemes are often "little languages" with limted syntax and very specific applications.

To really get into little languages, find a copy of the book "Programming Pearls" by Jon Bentley and read the chapter on (what else ;-) "Little Languages". You'll probably find the other chapters interesting also.

For another look at little languages, go here http://www.faqs.org/docs/artu/minilanguageschapter.html
Just remember to take time out to eat today and sleep tonight ;-)

I've written little languages for such things as interactive telephone surveys (read number from file, display name & location, dial number, prompt survey taker with greeting, accept 'answer/busy/no answer/answering machine' response from survey taker, if answered prompt survey taker with first question (then other questions based on called party responses), otherwise record type of failure and go to next number. The little language was the encoding of the questions and responses and where to go based on the each response (jump to another question based on response, say "Thank you" and hang up, save responses with date/time, etc). This one was used in political campaigns small (local school board) and large (US Congress). Back in the days of DOS, I wrote a complete political campaign management system in a dBase clone, including laser-printed facsimiles of the standard reporting forms - it also worked in a DOS box under Windows 3.x (and probably up through XP). If you're curious, look here: http://www.jecarter.us/thelabwiz/cam.html

John
 

boriz

Senior Member
(Prolly old idea)

Index by pairs of letters. EG:

Of “All the world's a stage, And all the men and women merely players”

First pair: “Al”
Second “l “
Third “th”
Fourth “e “

Then do a frequency test on all pairs.



Or threes, or fours ?
 

Jeremy Leach

Senior Member
Thanks Papaof2 ... I've ordered that book, looks a classic.

And yep Boriz, good idea - although I can see it would only work well with large volume of text because a high number of pairs and a big overhead.
 

SilentScreamer

Senior Member
Thanks Papaof2 ... I've ordered that book, looks a classic.

And yep Boriz, good idea - although I can see it would only work well with large volume of text because a high number of pairs and a big overhead.
Why not have common pairs and all used letters. Things like "the" "he" "she" "at" are likely to come up often so would justify having. A pair like "zo" is probably useless most of the time (unless you're talking about zoos alot :p).
 

Rickharris

Senior Member
After looking at this I though would it be possible to design and make a mechanical voice? Picaxe controlled of course.

Mmm a little research suggests it is a complex area although even back in the 1770's some sucess was acheived with very crude mechanical systems.

Need to look at this further but it looks like some recognisable voice effects could be achieved with 5 or 6 servo controlled mechanisims in a box (mouth)and a piston for breath with the picaxe controlling position and providing the basic voice frequency.
 

BeanieBots

Moderator
Fascinating idea Rick.
Lots has been done in silicon and software to effect speech but I've never seen an attempt at controlling airflow with some "vocal cords" tongue and lips.
Hmm...
 
Top