How is PICAXE code stored on EEPROM?

bpowell

Senior Member
So, I'm doing a little tinkering with my 20x2 ...

I have a very simple program: (36 bytes)

Code:
main:
    high b.1
    sertxd ("Hello World!")
    pause 1000
    low b.1
    pause 1000
goto main

When that transfers to a PICAXE ... it looks like this (from PC to PICAXE)

Code:
00h AAh 00h 01h 00h 02h [5Ah 04h A7h 84h 2Bh E4h 87h 32h B9h B1h CDh
8Eh 6Fh 71h 03h 95h DCh DEh E7h 27h 36h 39h 91h C4h 21h FCh 0Fh A0h
19h 3Fh 81h F4h 32h 00h 5Fh F0h] 00h
(There are a LOT more 00h transferred, but I've truncated that ... I've BRACKETED what I *think* are the relevant 36 bytes)

This is tokenized code ... so things don't align on byte boundaries and such ... but if you take that code, convert it to binary, you can find (for instance) the entire "Hello World!" in there, in order ... (Padded with 0b01110 between each character)

If I push that code to an external EEPROM (#slot 4) the captured download code looks thus ...

Code:
00h AAh [04h] 01h 00h 02h 5Ah 04h A7h 84h 2Bh E4h 87h 32h B9h B1h CDh
8Eh 6Fh 71h 03h 95h DCh DEh E7h 27h 36h 39h 91h C4h 21h FCh 0Fh A0h
19h 3Fh 81h F4h 32h 00h 5Fh F0h 00h

The only change being the BRACKETED character (which I presume is "Slot" ... internal is Slot 0, External is slot 4)

Now, if I write a program in slot-0 to just read and dump the EEPROM, I get 256 bytes of stuff .... 220 of that is randomized filler I believe ... (The rest of the EEPROM is confirmed blank)

Code:
0: 54h [64h D1h 24h 6Eh 4Ch 86h 6Ah ACh E7h 6Fh FFh 48h 6Ch 42h B3h 65h 1Ch B8h 88h
20: 6Bh 77h 56h B6h CCh 89h E7h 2Ah 65h 8Fh 24h 56h 4Eh 09h 55h 66h E3h 09h] 36h 26h
40: D6h C6h F6h E6h 96h 86h B6h A6h 57h 47h 77h 67h 17h 07h 37h 27h D7h C7h F7h E7h
60: 97h 87h B7h A7h 50h 40h 70h 60h 10h 00h 30h 20h D0h C0h F0h E0h 90h 80h B0h A0h
80: 51h 41h 71h 61h 11h 01h 31h 21h D1h C1h F1h E1h 91h 81h B1h A1h 52h 42h 72h 62h
100: 12h 02h 32h 22h D2h C2h F2h E2h 92h 82h B2h A2h 53h 43h 73h 63h 13h 03h 33h 23h
120: D3h C3h F3h E3h 93h 83h B3h A3h 5Ch 4Ch 7Ch 6Ch 1Ch 0Ch 3Ch 2Ch DCh CCh FCh ECh
140: 9Ch 8Ch BCh ACh 5Dh 4Dh 7Dh 6Dh 1Dh 0Dh 3Dh 2Dh DDh CDh FDh EDh 9Dh 8Dh BDh ADh
160: 5Eh 4Eh 7Eh 6Eh 1Eh 0Eh 3Eh 2Eh DEh CEh FEh EEh 9Eh 8Eh BEh AEh 5Fh 4Fh 7Fh 6Fh
180: 1Fh 0Fh 3Fh 2Fh DFh CFh FFh EFh 9Fh 8Fh BFh AFh 58h 48h 78h 68h 18h 08h 38h 28h
200: D8h C8h F8h E8h 98h 88h B8h A8h 59h 49h 79h 69h 19h 09h 39h 29h D9h C9h F9h E9h
220: 99h 89h B9h A9h 5Ah 4Ah 7Ah 6Ah 1Ah 0Ah 3Ah 2Ah DAh CAh FAh EAh 9Ah 8Ah BAh AAh
240: 5Bh 4Bh 7Bh 6Bh 1Bh 0Bh 3Bh 2Bh DBh CBh FBh EBh 9Bh 8Bh BBh ABh FFh

I've BRACKETED what I *think* is the relevant code ... it's 37 bytes, and the download is only 36 ... so either the 09h or the 64h are some kind of marker.

However .... you'll notice the dump from the EEPROM looks nothing like the transfer from the PC. If I convert the BRACKETED section of the EEPROM dump to binary and comb through it, I can't find "Hello World!" in there anywhere (contiguously that is).

If I send a program to slot_0 with a "Run 4" ... then the program runs as expected (and outputs "Hello World") ... so the data is there, it's just ... I don't know, encoded(?) somehow.

Does anybody any any ideas on how I can decipher the EEPROM dump? Also, Why isn't it just a byte-for-byte transfer of the download program from the PC?
 
Last edited:
A few years ago I too started to decode the contents of the EEPROMed code. Gave up after I saw how much work it would take !. Also, I realised that decoding the EEPROM would be interesting, but not of any real use.

The reason that decoding is so difficult is because each instruction or piece of data is reduced to the minimum number of bits needed to describe it, which are then tacked on the end of the previous data or instruction. Nothing is aligned to byte boundries.

I can't remember exactly, but the gist of it is that each element starts with bits that describe its structure, followed by the value. For example, the values '0' and '1' only need 1 bit to differentiate, '2 to '3' need 2 bits , '4' to '7' needs 3 bits, '8' to '15' need 4 bits etc. This means that the 0 - 255 range of a single byte value is encoded as variable bit lengths, shorter for low values and longer for high values. ASCII data is similary squashed, as are instructions. This process means that references to variables also have different lengths, e.g. 'b0' is shorter than 'b2'. The code representing 'b0 = 1' is a lot shorter than the code representing 'b32 = 255'

As to why the EEPROM contents are not just a copy of the serial data, I don't know, I never looked at the serial. Maybe the chip does some of the compression on-the-fly.

To stand a chance of decoding the EEPROM you need to start with a really small programme, like 'b0 =1'. Write the result down as bits, then try 'b0 = 2', 'b0 = 4', etc. recording the results as bits. You will soon find the bit patterns for the values, then move on to using 'b2', 'b4', etc to get the patterns for variables.

It's not as easy as it sounds !
 
Yeah, the tokenized code is tricky for sure ... but, that should have all been handled by the compiler ... I'd *expect* that the compiler crunches the code down, then sends it to the PICAXE for storage. Having the PICAXE do any further refinement or compression before storing the code seems like a waste of space in the firmware.
 
I've never looked at the serial data. I expect it will be fully compiled, but split into chunks for transfer with handshaking and error handling. It could be possible to decode this, but it will be even more difficult than decoding the EEPROM.

If I was to attempt this task again, I would go about it in a different way.

Use the code in #slot 0 to read the EEPROM and output the binary representation as serial to a terminal.
After each test code is downloaded to #slot 4 the #slot 0 code will run, and if you arrange the screen format correctly it will be easy to see the differences between different test codes.

When you start to get an understanding of the structure, the code in #slot 0 can be modified to split the binary at each token boundary, then output the decoded token.

This would be an incremental process, updating the #slot 0 code with each new discovery. Eventually the #slot 0 code will be a complete decoder !.

Still not a quick project, but you will end up understanding PICAXE tokenisation more than anybody outside of Rev-Ed.
 
The serial from the PC to the PICAXE is pretty straight forward ... it basically just sends the BREAK signal, gets happy with the PICAXE's reply, and then spits out the tokenized code. My example above (the second code block) is basically a full transaction (PC to PICAXE ) ... I just omitted a bunch of extra 00h at the end ... you can take that, split it to binary, and easily find "Hello World!" in there ... it's NOT across byte boundaries, as token sizes are from 2 - 6 bits ... but the serial string is in there.

I just can't figure out why it changes when saved to EEPROM ... and I'm using that "Hello World!" as my Rosetta stone ... so I'd like to be able to find that string (in order) in the EEPROM data ... and I can't ... it's clearly in there, as the program runs as expected ...

This is a rabbit-hole I have no need to go down what-so-ever ... that being said, now that I've got my proto board with a 20x2 and an EEPROM on it, and I've updated my compiler to allow Slot 4 ... I might as well tinker a bit ... I might put some notes on this thread (and welcome any input) as I go.
 
Back
Top