Conway's Game of Life, has anyone done it with a PICAXE ?

Buzby

Senior Member
Hi All,

I'm trying to estimate how fast a PICAXE could calculate each generation in Conway's Game of Life.
My early trials seem to show that running at 64MHz calculating 4 generations per second on a 16x16 matrix should be possible.
( This is just calculation time, the actual display mechanism will add some overhead, but I've not decided on the type of display yet. )

Has anyone tried before, and did you have any success ?

Cheers,

Buzby
 

westaust55

Moderator
There was some discussion on the topic in the past:
http://www.picaxeforum.co.uk/showthread.php?13435-Game-of-Life


EDIT:
Searching on the internet, this concept of a 4x4 cell module (16 LEDs) and a controller seems interesting.
Each module with a microcontroller and interconenctable by 4 pins to an adjoining module on all sides to make a larger overall cell array (or cube)
http://www.makershed.com/product_p/mkad3.htm?Click=37845
http://learn.adafruit.com/game-of-life
http://makezine.com/gol/
Uses a 28 pin chip as the controller for each 4 x 4 cell module

Decent image of the same board here: http://www.instructables.com/id/Game-of-Life-Kit/
Uses an ATMEGA 48 but you can reverse engineer the circuit from the photo and start again with a PICAXE is need be.
If you want to reduce the IO count then multiplexing or charlieplxing are options and you have to think about the circuit design before you go too deeply into the coding.
 
Last edited:

PaulRB

Senior Member
I saw that instructable too and wondered if it could be done with a picaxe.

I was thinking 64 leds, rather than just 16, via charlieplexing. A 14m2 would in theory be able to drive 64 leds, but no pins left to communicate with adjacent cells, so perhaps 20m2.

As to updating the grid of cells, many years ago I coded a version that processed 8 or 16 cells in parallel to give a significant speed boost over 1 cell at a time. I did this on a 2MHz 8-bit 6502 processor (a BBC micro, if anyone remembers them!) I had a 128 x 128 or 256 x 256 grid updating around once per second, but I was able to code in assembler. Picaxe m2 would clock 16 times faster, but would be interpreting basic and multiplexing the display also.... an interesting project.

Paul
 

Buzby

Senior Member
There was some discussion on the topic in the past:
http://www.picaxeforum.co.uk/showthread.php?13435-Game-of-Life
Adhering to good forum etiquette, I searched before posting. So how did I miss that ? :confused:
Anyway, it looks like there been a lot of talk, but no action, so I'm going to give it a try.

After researching a few clever algorithms on the Net, and writing to test code, I'm fairly certain that even though PICAXE is slow in comparison to other platforms it should still be possible to build a presentable Life engine.

My plan, still flexible though, is to start with a 16x16 ( i.e 256 LEDs ). I think this can be done with no extra components using just a 28X2.
The idea of linkable modules is also something to consider, I'll keep that in mind.

Paul, your 6502 assembler code methods sound just like the sort of optimisations we did back in the day.
Things have moved on a bit, and now I'm going to try using state-machines, pre-calculated lookup tables, linked lists, sparse grids, etc.

( It's really just a project to try pushing the PICAXE envelope. If I specifically *needed* a fast Life engine I would not use PICAXE ! )

Cheers,

Buzby
 

PaulRB

Senior Member
Hmm... I'm not convinced all those fancy algorithms and data structures will bring any real benefits on a small grid like 16x16, so it will be interesting to know how you get on.

So how are you thinking of building the display? Still thinking of using 8x8 led matrix mapped onto a larger charlieplexing grid, as discussed in that recent thread? Won't you at least need some driver transistors or darlingon arrays to overcome the 20~25mA per pin limit?

http://3.bp.blogspot.com/-EfpnHCitrCA/UUkFyPYwE_I/AAAAAAAAD94/a2k1Z-9Jang/s1600/8x8-22.png

Paul
 
Last edited:

Buzby

Senior Member
Hmm... I'm not convinced all those fancy algorithms and data structures will bring any real benefits on a small grid like 16x16, so it will be interesting to know how you get on.
That's exactly what I want to find out, will a fancy algorithm make a PICAXE fly ?

The display is not my primary aim, it's the techniques used in calculating the results I'm more interested in.
So for testing I'm going to use hserout to a VB app which will be my display.

The grid size makes a huge difference to the speed at which it runs, and not just because there are more LEDs.
For instance, the 'edge' calculations take up significantly more relative time as the grid gets smaller, but the RAM requirements go up as the grid gets bigger.
I've done enough testing at 32x32 to see that the PICAXE could likely calculate a single frame fast enough, but there is not enough RAM to hold the buffer for the next.

I'm just in the process of writing the VB, so when I've finished that we'll have a good test bed to try ideas on.

Cheers,

Buzby
 

PaulRB

Senior Member
I've done enough testing at 32x32 to see that the PICAXE could likely calculate a single frame fast enough, but there is not enough RAM to hold the buffer for the next.

I'm just in the process of writing the VB, so when I've finished that we'll have a good test bed to try ideas on.
Shame, I'm on Ubuntu, so don't think I can share your vb.

If you code carefully, you shouldn't need a complete second buffer. Just enough so that when you overwrite the updated data for a cell or group of cells, you have a copy of the originals to calculate the next batch of cells, if that makes sense.
 

Buzby

Senior Member
This is the VB. It's as rough as a bear's backside, but it does the job.

It uses a 16x16 grid of VB Labels, numbered 0 to 255.
Each Label is Green/White for On/Off, and displays the neighbour count, 0 to 8.

You should be able to translate it into just about any language that can accept characters from a Com port.

The protocol is easy.

If the received character is 255 then it's the start of a new generation, so set the cell pointer to 0.
If it's not 255 then it's a cell. Bit 4 is the On/Off state, bits 0-3 are the neighbour count, 0 to 8.

Code:
Private Sub Command1_Click()

' This is just used to scramble the display in the VB app
' Lets me see if it's actually updating as it should.

' Label the cells 0-255, and build a random pattern
For x = 0 To 255
    Label1(x).ToolTipText = x
    
    y = Int((6 * Rnd) + 1)
    If y > 4 Then
        Label1(x).BackColor = vbGreen
    Else
        Label1(x).BackColor = vbWhite
    End If
Next x


End Sub

Private Sub Command2_Click()
   Dim Instring As String
   Dim InVal As Byte
   
   MSComm1.CommPort = 2
   MSComm1.Settings = "19200,N,8,1"
   MSComm1.InputLen = 1
   MSComm1.PortOpen = True
Do
    ' Get a character from the com port
    Do
       DoEvents
       Buffer$ = MSComm1.Input
    Loop Until Buffer$ <> ""
     
    InVal = Asc(Buffer$)
    
    ' If InVal is 255 then it's the start of a new generation
    If InVal = 255 Then
        CellPointer = 0
        GenCount = GenCount + 1
        Label2.Caption = GenCount
    Else
    
    ' If not 255, then it's a cell
        ' Show On/Off
        CellState = InVal And 16
        If CellState = 0 Then
            Label1(CellPointer).BackColor = vbWhite
        Else
            Label1(CellPointer).BackColor = vbGreen
        End If
        
        ' Show neighbour count
        CellCount = InVal And 15
        Label1(CellPointer).Caption = Chr$(CellCount + 48)
        
        ' Point to next cell
        CellPointer = CellPointer + 1
    End If
   
Loop

   MSComm1.PortOpen = False
   End Sub
Pic_1.png
 

hippy

Ex-Staff (retired)
It is possible to get 4,800 generations per second with a PICAXE. That does however require a 20X2 per cell controlling a single LED but it is as infinitely expandable as your pockets are deep and the National Grid can power.

As it's faster than needed it does allows multiple colours and/or brightness depending on how long the cell has been alive and other fancy tricks. A single PICAXE could handle more than one cell but as always it is a case of balancing costs against performance, balancing simplicity and ease of implementation against complexity.

For a single chip solution, I managed to squeeze 2.4 generations per second of 16x16 out of a 28X2 at 64MHz, excluding display time. I haven't posted my code because I don't want to spoil the fun but I will post it later and it's probably not too much of a spoiler to say my approach was to use scratchpad, one byte per cell. It also ports to the 20X2 with some changes.

One issue with clever algorithms is they can take longer to execute than the time they potentially save so I just went for optimising things as best I could to make execution as fast as possible.

I output using SERTXD like the old text chess board and Star Trek games, using Terminal to check that known Game of Life patterns behaved as expected. The Glider is a useful one for testing that. I took the SERTXD's out and just reported the generation numbers, timed 100 generations to calculate what the generation rate was.

For a real LED display, I'd probably have two PICAXE, the generator simply passing the data to the second which displays it. That allows the generator to output on the fly, not have to store-up data and output it as a frame.

One important question though; at what rate does generation actually become too fast ? There may not be anything actually wrong with just a few generations per second, above that you might not be able to see what is happening.

That's not to say don't try to get the maximum rate because that is a worthy challenge in its own right and would allow a PICAXE to control larger game boards.
 

PaulRB

Senior Member
I've been having a think about how to turn this into more of a stand-alone project.

What I'm thinking is one of those common 128x64 graphic lcd screens. They have a simple 8 bit parallel interface, and you can read the data as well as write, so possibly avoiding having to store the entire frame in picaxe memory.

http://www.ebay.co.uk/itm/128x64-LCD-Module-/111058451247?pt=UK_BOI_Electrical_Test_Measurement_Equipment_ET&hash=item19db997b2f

Start with 16x16 pixels, and over time see if its possible to get up to 64x64 or perhaps even the full 128x64.
 
Last edited:

Buzby

Senior Member
... For a single chip solution, I managed to squeeze 2.4 generations per second of 16x16 out of a 28X2 at 64MHz, excluding display time. I haven't posted my code because I don't want to spoil the fun but I will post it later and it's probably not too much of a spoiler to say my approach was to use scratchpad, one byte per cell. It also ports to the 20X2 with some changes. ...
Hi hippy,

That's exactly what I'm using, one byte per cell.
Each byte holds the current On/Off state and the neighbour count.

Please don't post your code until I've had a go.

Cheers,

Buzby
 

Buzby

Senior Member
... What I'm thinking is one of those common 128x64 graphic lcd screens. They have a simple 8 bit parallel interface, and you can read the data as well as write, so possibly avoiding having to store the entire frame in picaxe memory. ...
Hi Paul,

That's worth a look, once I've got the engine working.

Cheers,

Buzby
 

hippy

Ex-Staff (retired)
What I'm thinking is one of those common 128x64 graphic lcd screens. They have a simple 8 bit parallel interface, and you can read the data as well as write, so possibly avoiding having to store the entire frame in picaxe memory.
That should work but it's really an issue of speed. Doable but not practical.

If it takes around half a second per generation for a 16x16 then that scales to 16 seconds for 128x64. And that's before considering data will be less than optimally accessible. It could require a 20 fold increase or more in processing speed to get back to usable performance which is unlikely achievable.

There are some algorithms which can speed processing up, particularly if the data processing can be parallelised, but the PICAXE isn't really tuned to do that, and neither does it have a lot of memory. I think that no matter which way you try and go there will be some brick wall in the way for handling large board areas with a single chip. Probably the only solution is to parallel the hardware, multiple PICAXE feeding the display.
 

PaulRB

Senior Member
Hi Busby & Hippy.

I have my code working. 16x16 matrix on 20x2 @ 64MHz.

You're not going to believe this, but I think I'm getting between 15 and 20 generations per second!

Can I check my results against one of yours please? I was thinking if you gave me a starting grid, I will run for (say) 500 generations and give you back the result to check against yours? I will also time 500 generations more accurately.

I'm finding the Axepad terminal window very frustrating. It wasn't designed with this kind of thing in mind. I can't select 76800 baud, it only goes up to 38400. I can't see all 16 lines my code is outputting.

Any ideas how I could hook some other ubuntu app to receive the data, ideally with full VT code support, so I can send codes to move the cursor back up 16 lines to overwrite the same data on screen.

Paul
 

Buzby

Senior Member
Hi Paul,

15 to 20 gens per sec seems very high, but maybe you've invented a new algorithm !.

I can't help you with Ubunto or anything like that, but a quick test is to load the grid with a Glider.

Every 4 gens the pattern should be the same but shifted, and if you have wrap around, then every 64 gens the Glider should be back on it's starting position.

Cheers,

Buzby
 

hippy

Ex-Staff (retired)
I ran my code at EM32 on the 28X2 as I have an 8MHz crystal fitted, used 38400 baud and simply halved the time to get what it would take at 64MHz.

Mine's 82 seconds for 100 generations, so 41 seconds at 64MHz 410ms per generation. I fixed a bug which probably added some microseconds but haven't timed it since.

Faster than 2 generations per seconds I would say was good, 20 impressive. I ran two nested FOR-NEXT loops for row and column and did every cell, takes the same time whether alive or empty, about 1.6ms per cell. There might be some tricks to reducing the number of cells to check which would speed things up.

If you are having problems with serial, simply run it slower to check the algorithm is working and perhaps only print results for the first few rows.

For testing, the following shows my output ( first five rows ) for a glider...

Code:
0
--#-------------
#-#-------------
-##-------------
----------------
----------------

1
-#--------------
--##------------
-##-------------
----------------
----------------

2
--#-------------
---#------------
-###------------
----------------
----------------

3
----------------
-#-#------------
--##------------
--#-------------
----------------

4
----------------
---#------------
-#-#------------
--##------------
----------------
 
Last edited:

PaulRB

Senior Member
...maybe you've invented a new algorithm !.
Same one I wrote back in the late 80's as far as I can remember.
every 64 gens the Glider should be back on it's starting position.
Good tip, will try that. I have indeed used a glider, and it seems to be gliding and wrapping OK.

With the help of my wife, I have figured out how to get the output onto a proper terminal:
Code:
stty -F /dev/ttyUSB0 38400
cat /dev/ttyUSB0
Now I can try 76800 baud and the VT codes to more cursor on screen!
 

hippy

Ex-Staff (retired)
Same one I wrote back in the late 80's as far as I can remember.
You may well have a better and faster algorithm than me; mine's very much 'brute force'. It would be interesting to hear the approach you have taken if you have a few minutes to spare.
 

PaulRB

Senior Member
OK, here's a video. Matrix generation looking OK. Speed here is limited by updating the PC with data. I couldn't get 76800 baud to work - stty command would not accept it.

Busby, do you want me to give you some pointers, or just post my code? Like Hippy said, I don't want to spoil the fun. I am, after all, simply repeating something I did years ago, when I was in the "prime of youth", or as close as I ever got to it...

 
Last edited:

PaulRB

Senior Member
And another vid with the PC updates only once every 20 generations:

[video=youtube_share;8jCs27TqVEs]http://youtu.be/8jCs27TqVEs[/video]
 

Buzby

Senior Member
... Busby, do you want me to give you some pointers, or just post my code? Like Hippy said, I don't want to spoil the fun. ...
Hi Paul,

Don't post yet, I want to see how fast mine will be. ( It's got a few bugs at the moment, so results won't be here tonight. )

Cheers,

Buzby
 

PaulRB

Senior Member
No surprise, my lcd idea has already been done...

[video]https://www.google.co.uk/url?sa=t&source=web&cd=1&ved=0CC0QtwIwAA&url=http%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DWw_ 8emjJ4Ws&ei=Epe-UZBPw5bQBcjbgegL&usg=AFQjCNEU_q2A4qUkmMHoOZ6Y_Xr6ikackw&sig2=1hv7yOVP3Ozdz9lDaWC1kw[/video]
 

PaulRB

Senior Member
With a few minor tweaks and some more careful measurement, I am getting generation rates between 18 per second and 30 per second. 30 per second is when there's not much going on - e.g. a single glider going about its business.

I won't publish the code until Buzby and Hippy say its OK, but it would be good to get someone to look over it for potential speed-ups, because I have run out of ideas for the moment.

At the moment, the code is optimised for 16x16, and going to 32x32 will mean some loss of speed over and above the increase in grid size. Hard to say how much more. However, going from 32x32 to to 64x64 should only slow down in line with the increase in grid size. So lets say 6 times slower for 32x32 and a further 4 times slower for 64x64. This would give a generation rate of around 1 per second at 64x64. Which is about the same speed I got on my 2MHz 6502 all those years ago. This means that the Picaxe basic interpreter is absorbing the 32 x increase in clock speed. Oh well...

As for updating a GLCD as well, how much slow-down that will cause is also hard to say. With an 8-bit parallel interface, bit-mapped pixels and given the fact that my matrix data is already packed into bits, it might not be too bad. Only way to find out for sure, so I will order one. Anyone got any advice on what to look for? Otherwise will probably go with the one l posted the link to earlier.

Paul
 

Buzby

Senior Member
Hi Paul,

That's sounding fine. I'd really like to see your code, but not till after I've got mine working !.

( At work today, so no bug fixing till tonight. )

Cheers,

buzby
 

PaulRB

Senior Member
At work today, so no bug fixing till tonight. )
I should be too, but I did something to my dodgy lumbar disc at the weekend, so can't sit for long or walk very far these last couple of days!

As to the speed, I confess I am a bit disappointed that 64x64, never mind 128x64, looks like it will be a little too slow to be interesting on a glcd. Hippy was right about that, although I think he must be a surprised how close I might get! Another 2x or 4x performance improvement would be perfect, rather than the 20x he calculated. But I'm pretty sure I'm not going to get that from a Picaxe, unless one of your fancy algorithms turns out to be a winner at those larger grid sizes.
 

boriz

Senior Member
The fastest possible method is a big lookup table. EG:

For an 8x8 grid, that's 64 cells, 64 bits. You read all the cells as contiguous memory into a single 64 bit variable, then use that variable as an index looking into an array of 64 bit data, where the data is the new 64 bit cell pattern. So all possible 8x8 patterns are stored in the array and you just reference one directly according to the bit pattern of the previous generation.

The pseudocode would be simply
Code:
Do
    newgen=array(oldgen)
    oldgen=newgen
loop
The downside of course is that the array needs 18,446,744,073,709,551,616 (about 18 and a half quintillion) cells. 4 times this in memory bytes. And that's just for an 8x8 grid!

There might be a compromise available though. With a smaller array and the grid being split into blocks, processing one block at at time. 4x4, (16 bits) for example, would only require an array of 131072 memory bytes.
 

Buzby

Senior Member
For those of you that want to see what tricks the Game of Life can play, take a look at this app : http://golly.sourceforge.net/
The animated header at the top of the web page is produced by Life patterns which generate the moving text.

( I'm just about to start debugging my Life engine again, don't hold your breath. )
 

boriz

Senior Member
Here's one I prepared earlier for PC with 320x240 grid. It displays FPS and I'm getting about 470-500 FPS.

If you can't see the full instruction line at the top of the screen, it says '<SPACE> to start'.

http://www.zen86415.zen.co.uk/QuickLife.zip

ZIP includes README, source code and executable. I compiled it myself (in 2004). It's safe. But if you're nervous, get a hold of the compiler and compile it yourself.

It uses a simple brute force method, but because Ibasic has an inline assembler, I managed to speed it up quite a lot. The program was actually written to showcase what a bit of inline assembler can do for you.
 

hippy

Ex-Staff (retired)
I changed from byte per cell to 16 cells as bits per word and managed to get just over 11 generations per second for 16x16.

I'm still using a brute force approach, counting neighbours for every cell, and I don't think I'm going to get any further increase in speed without changing algorithm. If I can figure out a 'byte per cell, keeping count of neighbours' algorithm I might try that but otherwise that's me done. It was an interesting challenge and I over estimated the time bit twiddling would take.
 

PaulRB

Senior Member
I changed from byte per cell to 16 cells as bits per word and managed to get just over 11 generations per second for 16x16.
Hippy, are you still running at 32Mhz? If so, you have matched if not slightly exceeded my code's speed!

We should exchange code to see if we can combine best of both. Buzby does not want to see our code yet, he is still debugging his attempt.
 

Buzby

Senior Member
At last, my algorithm works !

Not as fast as I wanted, gives about 8 per sec with just a glider ( excluding display time ), but there is plenty room for simple optimisation.
( The code still has lots of debug stuff in, but that is #defined out at runtime. )

My next step is to make it inherently faster by adding another twist to the algorithm.
This will take a day or two to code,

Cheers,

Buzby

Code:
#picaxe 28X2

#Slot 0

#no_table

#no_data

' #define debug

pause 4000

sertxd ("Conway LIFE v 5.bas",cr,lf)

#ifdef debug
 setfreq m16
 hsersetup B57600_16, %10
#else_if
 setfreq em64
 hsersetup B57600_64, %10
#endif 
 
 
Symbol x    		= b0 ' w0
Symbol ccell      = b1 ' w0
Symbol ncell      = b2 ' w1
Symbol chng       = b3 ' w1
Symbol ptr_2      = w2 ' b4 b5
Symbol LoopCount  = b6 ' w3 
Symbol y          = b7 ' w3

Symbol x2     = b8	  ' w4
Symbol y2     = b9     ' w4

Symbol ptr_s   = w5
Symbol ptr_e   = w6
Symbol AltBuff = w7
Symbol cpr     = w8  

Symbol GenCnt  = w9


High C.2

' Empty scratchpad
for ptr = 0 to 255
	   @ptr = 0 
next

#rem 
' Build a Dot
' ---------------
' Live cells
put 35, $10
' Neighbour cells
put 18, $04
put 19, $04
put 20, $04
put 34, $04
put 36, $04
put 50, $04
put 51, $04
put 52, $04
put 35, $03
 
#endrem
#rem
' Build a Dot
' ---------------
' Live cells
put 167, $10
' Neighbour cells
put 150, $04
put 151, $04
put 152, $04
put 166, $04
put 168, $04
put 182, $04
put 183, $04
put 184, $04
#endrem




#rem
put  74, $03
put  76, $03
put  58, $03
put  60, $03
put  91, $03
#endrem



#rem 

 
' Build a Blinker vert
' ---------------
' Live cells
put 35, $11
put 51, $12
put 67, $11
' Neighbour cells
put 18, $01
put 19, $01
put 20, $01
put 36, $02
put 52, $03
put 68, $02
put 84, $01
put 83, $01
put 82, $01
put 66, $02
put 50, $03
put 34, $02
 
' Build a Blinker hoz
' ---------------
' Live cells
put 182, $11
put 183, $12
put 184, $11
' Neighbour cells
put 197, $01
put 181, $01
put 165, $01
put 166, $02
put 167, $03
put 168, $02
put 169, $01
put 185, $01
put 201, $01
put 200, $02
put 199, $03
put 198, $02
 
 
#rem 
  
 
put 57, $03
put 58, $03 
 
 
 

put 219, $03 
 
 
#endrem

 
' Build a Glider
' --------------

' Live cells
put 49,17
put 50,19
put 51,18
put 35,19
put 18,17
' Neighbour cells
put  1,1
put  2,1
put  3,1
put 17,1
put 19,2
put 20,1
put 32,1
put 33,3
put 34,5
put 36,2
put 48,1
put 52,2
put 64,1
put 65,2
put 66,3
put 67,2
put 68,1




' Copy buffer to alternate buffer

for LoopCount = 0 to 255
	ptr = LoopCount
	ptr_2 = LoopCount + 256
	put ptr_2, @ptr
next

AltBuff = 256

' Show starting pattern
gosub DisplaySCR



do ' Main loop

' Calculate a generation
' ----------------------

ptr = 0

' Scan current buffer looking for live cells or cells with neighbors	

For y = 0 to 15
For x = 0 to 15

	' Get this generation
	ccell = @ptr
   
   ' Test this generation 
	If ccell <> 0 then
	
		' Lookup next generation state   
		'               00  01  02  03  04  05  06  07  08  09  0A  0B  0C  0D  0E  0F  10  11  12  13  14  15  16  17  18
      Lookup ccell, ($00,$01,$02,$13,$04,$05,$06,$07,$08,$00,$00,$00,$  00,$00,$00,$00,$00,$01,$12,$13,$04,$05,$06,$07,$08  ), ncell
           
      ' Does it need to change ?
      chng = ccell XOR ncell AND $10
  
      If chng > 0 then
  
         ' Make pointer to buffer
	   	ptr_2 = ptr + 256
	   	
	   	' ( Next optimisation, pre-calculate offset to neighbours. )
	   	' Make addresses of neighbours
	   	
	   	
	   	

      	If bit20 > 0 then    ' Bit 4 of ncell, 1 = Birth, 0 = Death
      	   
      	   get ptr_2, ncell  ' Set cell On
      		ncell = ncell OR $10
	   		put ptr_2, ncell      	
           
      		Gosub Birth			' Update neighbours
      			
      	else
      	
      	   get ptr_2, ncell	' Set cell Off
      		ncell = ncell AND $0F
	   		put ptr_2, ncell   
	   		
      	   Gosub Death			' Update neighbours
 	     			
      	endif ' End of Birth/Death
        	
      endif ' End of changed ?

	endif	' End of cell non-zero
	
	inc ptr
	
Next x  

Next y  	
	

   ' Show end result
	gosub DisplaySCR

' Copy buffer, 

for ptr = 0 to 255
	ptr_2 = ptr + 256
   get ptr_2, @ptr
next

	toggle C.2
		
inc GenCnt	
loop until GenCnt = 100


end



' Display scratchpads
' -------------------
DisplaySCR:

   w20 = ptr
	hserout 0,(128)
	for ptr = 0 to 255
		hserout 0,(@ptr)
	next
	ptr = w20
Return




' Increment neighbours 
' --------------------
Birth:

#ifdef debug
	sertxd ("Birth at x=",#x," y=",#y,cr,lf)
#endif



   cpr = ptr 						' Remember current cell ptr

   ' top left						' Increment neighbours (~ In alternate buffer ~ )
   x2 = x - 1 AND 15    	
 	y2 = y - 1 AND 15 
	ptr = y2 * 16 + x2 + AltBuff
   inc @ptr
#ifdef debug
	sertxd ("tl x=",#x," y=",#y," ptr=",#ptr," val=",#@ptr,cr,lf)
#endif

	' top centre
 	y2 = y - 1 AND 15 
	ptr = y2 * 16 + x + AltBuff
   inc @ptr
#ifdef debug
	sertxd ("tc x=",#x," y=",#y," ptr=",#ptr," val=",#@ptr,cr,lf)
#endif

   ' top right
   x2 = x + 1 AND 15    	
 	y2 = y - 1 AND 15 
	ptr = y2 * 16 + x2 + AltBuff
   inc @ptr
   
#ifdef debug
	sertxd ("tr x=",#x," y=",#y," ptr=",#ptr," val=",#@ptr,cr,lf)
#endif

   ' mid left
   x2 = x - 1 AND 15    	
	ptr = y * 16 + x2 + AltBuff 
   inc @ptr
   
#ifdef debug
	sertxd ("ml x=",#x," y=",#y," ptr=",#ptr," val=",#@ptr,cr,lf)
#endif

   ' mid right
   x2 = x + 1 AND 15    	
	ptr = y * 16 + x2 + AltBuff
   inc @ptr

#ifdef debug
	sertxd ("mr x=",#x," y=",#y," ptr=",#ptr," val=",#@ptr,cr,lf)
#endif



   ' bottom centre
 	y2 = y + 1 AND 15 
	ptr = y2 * 16 + x + AltBuff
   inc @ptr
#ifdef debug
	sertxd ("bc x=",#x," y=",#y," ptr=",#ptr," val=",#@ptr,cr,lf)
#endif

   ' bottom left
   x2 = x - 1 AND 15    	
 	y2 = y + 1 AND 15 
	ptr = y2 * 16 + x2 + AltBuff
   inc @ptr
#ifdef debug
	sertxd ("bl x=",#x," y=",#y," ptr=",#ptr," val=",#@ptr,cr,lf)
#endif

    ' bottom right
   x2 = x + 1 AND 15    	
 	y2 = y + 1 AND 15 
	ptr = y2 * 16 + x2 + AltBuff
   inc @ptr  
#ifdef debug
	sertxd ("br x=",#x," y=",#y," ptr=",#ptr," val=",#@ptr,cr,lf)
#endif
   
   ptr = cpr			' Restore cell ptr

Return




' Decrement neighbours 
' --------------------
Death:
#ifdef debug
	sertxd ("Death at x=",#x," y=",#y,cr,lf)
#endif
   cpr = ptr 						' Remember current cell ptr
	' Decrement neighbours (~ In alternate buffer ~ )
   

   ' top left					
   x2 = x - 1 AND 15    	
 	y2 = y - 1 AND 15 
	ptr = y2 * 16 + x2 + AltBuff
   dec @ptr
   
   ' mid left
   x2 = x - 1 AND 15    	
	ptr = y * 16 + x2 + AltBuff 
	dec @ptr

   ' bottom left
   x2 = x - 1 AND 15    	
 	y2 = y + 1 AND 15 
	ptr = y2 * 16 + x2 + AltBuff
   dec @ptr
 
   ' bottom centre
 	y2 = y + 1 AND 15 
	ptr = y2 * 16 + x + AltBuff
   dec @ptr

	' top centre
 	y2 = y - 1 AND 15 
	ptr = y2 * 16 + x + AltBuff
   dec @ptr

   ' top right
   x2 = x + 1 AND 15    	
 	y2 = y - 1 AND 15 
	ptr = y2 * 16 + x2 + AltBuff
   dec @ptr

   ' mid right
   x2 = x + 1 AND 15    	
	ptr = y * 16 + x2 + AltBuff
   dec @ptr

   ' bottom right
   x2 = x + 1 AND 15    	
 	y2 = y + 1 AND 15 
	ptr = y2 * 16 + x2 + AltBuff
   dec @ptr

	ptr = cpr			' Restore cell ptr
	
Return

 
Last edited:

PaulRB

Senior Member
Nice work Buzby!

Suggestion: I've heard it said that LOOKUP is slow. If true, you could put that data into EEPROM or TABLE and look it up from there with READ.

Think I know where you're going with "pre-calculate neighbour offsets" but be careful not to limit yourself to a particular grid size, unless you've no interest going beyond 16x16.

I will post my code shortly.

Paul
 

hippy

Ex-Staff (retired)
Good work. Your algorithm also shows me what I was missing in my 'keeping track of neighbours' code :)

Swap the LOOKUP for a READ as PaulRB says and that should give a massive improvement.

If you use 'ptr' to index as %yyyyxxxx, have y run as '$00 to $F0 Step $10', you can get some good optimisations in the increment / decrement neighbour functions and elsewhere. You don't need to repeatedly 'x2 = x - 1 AND 15' etc, just do once at the top of the code, add 'AltBuff' early on with that, then you can use something like eight of these ...

ptr = x1 | y2 : @ptr = @ptr + 1

Change all multiplications to shifts.

The main gain is in only tracking changes, so further optimisations will have less benefits, but every little helps.
 

Buzby

Senior Member
Hi hippy & Paul,

Thanks for the praise, it was a bit of a struggle to get the algorithm sorted, so a few words of encouragement help a lot !.

The purpose of the code so far was to prove the idea worked, the next step is to make the speed up mods you both suggested.

After that I've got a 'tracking' idea, but as hippy pointed out earlier, the advantages of 'clever code' may be lost on such a small grid.

I doubt that a 28x2 will get to run LIFE and drive a 16x16 LED matrix at the same time, but you never know !.

Cheers,

Buzby
 

PaulRB

Senior Member
I doubt that a 28x2 will get to run LIFE and drive a 16x16 LED matrix at the same time, but you never know !.
Yes, I think that will indeed be possible. Use a regular interrupt from the timer to do the display multiplexing in the interrupt routine, while the main loop calculates the generations. (Care needed to avoid clashing over use of variables between the two.) Even if the interrupt routine takes up half the time, you will still get 4 generations per sec, which as we said earlier, is enough to make an interesting animation.

Hippy, did you miss my question at the end of the previous page re your latest timings?
 

PaulRB

Senior Member
OK then, here's my code.

I've put a reasonable splash of commenting in, but I expect parts will still seem impenetrable - let me know. Also any suggestions welcome!

Code:
#picaxe 20x2

' Game Of Life 16x16
' PaulRB
' Jun 2013

' Cell data locations in ram
symbol Row00 = 72
symbol Row01 = 74
symbol Row02 = 76
symbol Row03 = 78
symbol Row04 = 80
symbol Row05 = 82
symbol Row06 = 84
symbol Row07 = 86
symbol Row08 = 88
symbol Row09 = 90
symbol Row10 = 92
symbol Row11 = 94
symbol Row12 = 96
symbol Row13 = 98
symbol Row14 = 100
symbol Row15 = 102
symbol Row00copy = 104

'Variables holding data on neighbouring cells
symbol NeighbourN = w14
symbol NeighbourNW = w15
symbol NeighbourNE = w16
symbol CurrCells = w17
symbol NeighbourW = w18
symbol NeighbourE = w19
symbol NeighbourS = w20
symbol NeighbourSW = w21
symbol NeighbourSE = w22

'Variables used in calculating new cells
symbol tot1 = w8
symbol carry = w9
symbol tot2 = w10
symbol tot4 = w12

'General variables
symbol row = b4
symbol RowS = b5
symbol gen = b6 
symbol generation = w23

main:

	setfreq m32
	gosub initMatrix
	sertxd (cr, lf)
	gosub outputMatrix

	do

		for gen = 1 to 1 ' increase to 250 for accurate timing of generations
			gosub generateMatrix
		next
		gosub outputMatrix
		
	loop

outputMatrix:

	'Send matrix data for display on PC
	'To view on terminal window on Ubuntu/Linux:
	'	stty -F /dev/ttyUSB0 38400
	'	cat /dev/ttyUSB0
	
	sertxd ("Gen ", #generation, "     ",cr, lf)
	for row = Row00 to Row15 step 2
		peek row, word w0
		sertxd (#bit15, #bit14, #bit13, #bit12,#bit11, #bit10, #bit9, #bit8,#bit7, #bit6, #bit5, #bit4,#bit3, #bit2, #bit1, #bit0, cr, lf)
	next
	'Move cursor back up on PC screen to over-write next time
	sertxd (27, "[17A")
	
	return
	
initMatrix:

	'Set up initial cells in matrix
	w0 = %0000000000000000 : poke Row00, word w0
	w0 = %0000000000000000 : poke Row01, word w0
	w0 = %0000000000000000 : poke Row02, word w0
	w0 = %0000000000000000 : poke Row03, word w0
	w0 = %0000000000000000 : poke Row04, word w0
	w0 = %0000000000000000 : poke Row05, word w0
	w0 = %0000000000000000 : poke Row06, word w0
	w0 = %0000000000000000 : poke Row07, word w0
	w0 = %0000000000000000 : poke Row08, word w0
	w0 = %0000000000000000 : poke Row09, word w0
	w0 = %0000000000000000 : poke Row10, word w0
	w0 = %0000000000000000 : poke Row11, word w0
	w0 = %0000000000000000 : poke Row12, word w0
	w0 = %0000000011100000 : poke Row13, word w0
	w0 = %0000000000100000 : poke Row14, word w0
	w0 = %0000000001000000 : poke Row15, word w0
	return
	
generateMatrix:

	setfreq m64

	'set up N, NW, NE, W & E neighbour data
	peek Row15, word w0
	NeighbourN = w0
	bit16 = bit0 : w0 = w0 >> 1 : bit15 = bit16 : NeighbourNW = w0
	w0 = NeighbourN
	bit16 = bit15 : w0 = w0 << 1 : bit0 = bit16 : NeighbourNE = w0
	
	peek Row00, word w0
	poke Row00copy, word w0 	'copy row 0 to location after row 15 to remove need for wrap-around code in the loop	
	CurrCells = w0
	bit16 = bit0 : w0 = w0 >> 1 : bit15 = bit16 : NeighbourW = w0
	w0 = CurrCells
	bit16 = bit15 : w0 = w0 << 1 : bit0 = bit16 : NeighbourE = w0
	
	'Process each row
	for row = Row00 to Row15 step 2
		
		'Pick up new S, SW & SE neighbours
		rowS = row + 2
		peek rowS, word NeighbourS
		w0 = NeighbourS
		bit16 = bit0 : w0 = w0 >> 1 : bit15 = bit16 : NeighbourSW = w0
		w0 = NeighbourS
		bit16 = bit15 : w0 = w0 << 1 : bit0 = bit16 : NeighbourSE = w0
		
		'Any live cells at all in this region?
		w0 = CurrCells | NeighbourN | NeighbourS '  | NeighbourNW | NeighbourNE | NeighbourW | NeighbourE | NeighbourSW | NeighbourSE  (not needed for 16x16 grid)

		if w0 > 0 then
		
			'Count the live neighbours (in parallel) for the 16 current cells
			'However, if total goes over 3, we don't care (see below), so counting stops at 4
			tot1 = NeighbourN
			tot2 = tot1 & NeighbourNW : tot1 = tot1 ^ NeighbourNW
			carry = tot1 & NeighbourNE : tot1 = tot1 ^ NeighbourNE : tot4 = tot2 & carry : tot2 = tot2 ^ carry
			carry = tot1 & NeighbourW : tot1 = tot1 ^ NeighbourW : tot4 = tot2 & carry | tot4 : tot2 = tot2 ^ carry
			carry = tot1 & NeighbourE : tot1 = tot1 ^ NeighbourE : tot4 = tot2 & carry | tot4 : tot2 = tot2 ^ carry
			carry = tot1 & NeighbourS : tot1 = tot1 ^ NeighbourS : tot4 = tot2 & carry | tot4 : tot2 = tot2 ^ carry
			carry = tot1 & NeighbourSW : tot1 = tot1 ^ NeighbourSW : tot4 = tot2 & carry | tot4 : tot2 = tot2 ^ carry
			carry = tot1 & NeighbourSE : tot1 = tot1 ^ NeighbourSE : tot4 = tot2 & carry | tot4 : tot2 = tot2 ^ carry
		
			'Calculate the updated cells:
			' <2 or >3 neighbours, cell dies
			' =2 neighbours, cell continues to live
			' =3 neighbours, new cell born
			w0 = CurrCells | tot1 & tot2 &/ tot4
			poke row, word w0
			
		end if
		
		'Current cells (before update), E , W, SE, SW and S neighbours become new N, NW, NE, E, W neighbours and current cells for next loop
		NeighbourN = CurrCells : NeighbourNW = NeighbourW : NeighbourNE = NeighbourE
		NeighbourE = NeighbourSE : NeighbourW = NeighbourSW : CurrCells = NeighbourS 
	Next
	
	inc generation 
	
	setfreq m32
	
	return

end
 

hippy

Ex-Staff (retired)
Hippy, did you miss my question at the end of the previous page re your latest timings?
I did. Sorry about that. I'm still running at 32MHz but adjusting frame rates as if it were at 64MHz, so 11 generations per second if it were 64MHz.

I've attached my code which are both brute force methods, for cell per byte and 16 bit cells per word.

The later uses a trick available on the 28X2 of using the dirsD variable 'which doesn't exist' [sic] as a temporary byte because it allows bit access using dirsD.X variables. The 'flags' variable is used for the same reason. This means that the three sets of word row data ( top, mid, bot ) is instantly accessible as 48 bit variables and saves having to keep loading data into w0 or w1.

The loop is unrolled so the handling for each bit in a row is exactly the same but using different bits each time.
 

Attachments

Last edited:

PaulRB

Senior Member
The later uses a trick available on the 28X2 of using the dirsD variable 'which doesn't exist' [sic] as a temporary byte because it allows bit access using dirsD.X variables. The 'flags' variable is used for the same reason. This means that the three sets of word row data ( top, mid, bot ) is instantly accessible as 48 bit variables and saves having to keep loading data into w0 or w1.
Nice one! But only for 28x2?
 

Buzby

Senior Member
Hi Paul,

I've modded your output code to drive my VB, it looks very good !.

One thing I've got a mental block about, I can never think of PICAXE as a 16bit microprocessor !.
When that penny drops, your method come clearer. It's coded as if it was in assembler on a 16 bit CPU.
My thought processes worked the other way, trying to make a PICAXE use algorithms more suited to a high level language.

Regarding driving LEDs, I've not got a 16x16 module yet, and the only big matrix of LEDs I've got are on my orrery.
( Now there's a thought, Circular Life !. The rules would need to be different for each circle if it was to make any sense. )

I've not looked at hippy's code yet, I'm going to make the mods to mine first.

As it is, I think we've proved that a PICAXE is a viable solution for a 'hang on the wall' Life ornament.

Cheers,

Buzby

 

PaulRB

Senior Member
I've ordered my 128x64 lcd. Now I'm thinking of how to get the most out of it. I wonder if I can get the 20x2 to calculate a 32x16 grid and update the display with blocks of 4x4 pixels for each cell...
 
Top