Conway's Game of Life, has anyone done it with a PICAXE ?

westaust55

Moderator
@Buzby,

Having moved the data from LOOKUP commands to EEPROM as recommended by hippy you have seen a speed improvement.
EEPROM is accessed within the PIC microcontrollers using indirect methods whereas RAM is accessed directly which may lead to further speed improvement.
Try copying the data from EEPROM to Scratchpad (RAM) at the beginning of the program and thereafter accessing the data using the scratchpad memory pointer (ptr for setting location) and @ptr for data transfer to see if further improvement can be achieved.
 

Buzby

Senior Member
... Try copying the data from EEPROM to Scratchpad (RAM) at the beginning of the program and thereafter accessing the data using the scratchpad memory pointer (ptr for setting location) and @ptr for data transfer ...
Hi Westie,

If I copy the EEPROM to scratchpad I'm sure each individual access will be slightly faster, and as I have an access for every cell the cumulative saving may be noticeable.
However, I would need to set the value of ptr before each access. This adds an overhead, but is it less than the access time of the EEPROM ?

I.e, is :

ptr = ccell
ncell = @ptr

quicker than :

read ccell, ncell

It's at times like this that the only way to find out is to try it !.

Cheers,

Buzby
 

hippy

Ex-Staff (retired)
It's at times like this that the only way to find out is to try it !
Indeed and it's interesting that we all have different algorithms and variants on those algorithms. Double-buffering or single buffer is a difficult choice; you may win in one respect but lose in another. The only way to really tell which is best is to try them.

I got my 'keep count of the neighbours' code going but did not get any better results than my bit per cell version. I replaced my IF conditions which determine next cell state with a READ and went from 10 to 14 generations per second at 64MHz; doesn't sound a lot but that's near 30% improvement.

I don't use double buffering, just set $80 in the cell to say it should toggle in the next generation. That let me do a simple -

For ptr = $00 To $FF
Read @ptr, @ptr
Next

That's fast but then doing the toggle makes things slower. Double-buffering makes that more complicated and slower -

For ptr = $00 To $FF
Read @ptr, bTmp
ptr = ptr + $100
@ptr = bTmp
ptr = ptr - $100
Next

But, if there is nothing else to be done when alternating buffers, it very likely is a gain.
 

PaulRB

Senior Member
Half the price !.
Yes, but no driver chips. The the one I found has serial to parallel drivers, which would save time designing & constructing. Depends how much you want done for you...

And they can be daisy-chained to form a 32x32 grid, with no extra pins needed.
 
Last edited:

Buzby

Senior Member
... Half the price !. ... Yes, but no driver chips ...
Oops !. That's my problem of not reading the page fully when surfing on a phone !.
Back on a PC now, I can see what it realiy is. It's 4 seperate 8x8 modules, each with it's own 74HC595.

Definately an easier way than roll-your-own Multi or Charlieplexing, but where's the fun in that !.

Keep pushing the envelope !.

Cheers,

Buzby
 

PaulRB

Senior Member
It's 4 seperate 8x8 modules, each with it's own 74HC595.
Hmmm.. no, actually there's only 2 x 595s. This means a multiplex ratio of 1:16. That might not be very bright. 1:8 would be better. Also I think I spotted 1K series resistors. Maybe it would be ok, after all, its for indoor use.
 

PaulRB

Senior Member
My glcd arrived today. Got as far as soldering header pins to it and wiring up to the 20x2 on my breadboard. Code changes will have to wait...

20130619_232224.jpg
 

PaulRB

Senior Member
I now have the GLCD working, in a very basic way. The 16x16 life matrix is simply displayed on the top-left 16x16 pixels. I have slowed the clock speed to 16MHz to bring the update rate down to around 5 per second.

[video=youtube_share;tWN65upsue0]http://youtu.be/tWN65upsue0[/video]

What I've learned getting this to work:

  1. The GLCD will run on 3.8V (3 x AA NiMH), but it needs 5V (4 x AA NiMH) to get enough contrast. All you see are faint shadowy images at 3.8V
  2. With the 16x2 character LCDs I have, the contrast pot's terminals need to be connected to +V and 0V. With the GLCD, they need to be connected to the Vee output pin and 0V.
  3. The "Y" register controls the left-right position and the "X" register the top-bottom position, the opposite to what you would expect. This must be to make displaying text easier.

Code:
Code:
#picaxe 20x2

' Game Of Life 16x16
' PaulRB
' Jun 2013

' Cell data locations in ram
symbol Row00 = 72
symbol Row00b = 73
symbol Row01 = 74
symbol Row02 = 76
symbol Row03 = 78
symbol Row04 = 80
symbol Row05 = 82
symbol Row06 = 84
symbol Row07 = 86
symbol Row08 = 88
symbol Row09 = 90
symbol Row10 = 92
symbol Row11 = 94
symbol Row12 = 96
symbol Row13 = 98
symbol Row14 = 100
symbol Row15 = 102
symbol Row15b = 103
symbol Row00copy = 104

'Variables holding data on neighbouring cells
symbol NeighbourN = w14
symbol NeighbourNW = w15
symbol NeighbourNE = w16
symbol CurrCells = w17
symbol NeighbourW = w18
symbol NeighbourE = w19
symbol NeighbourS = w20
symbol NeighbourSW = w21
symbol NeighbourSE = w22

'Variables used in calculating new cells
symbol tot1 = w8
symbol carry = w9
symbol tot2 = w10
symbol tot4 = w12

'General variables
symbol row = b4
symbol RowS = b5
symbol gen = b6 
symbol generation = w23

'Pins controlling KS0108 Graphic LCD
symbol GLCD_RS = C.0 ' H = data, L = instruction
symbol GLCD_E = C.1
symbol GLCD_DATA = pinsB
symbol GLCD_CS1 = C.2
symbol GLCD_CS2 = C.3


main:

	setfreq m16
	gosub initMatrix
	
	dirsB = %11111111
	high GLCD_CS1
	low GLCD_CS2

	low GLCD_RS
	GLCD_DATA = %00111111 : pulsout GLCD_E, 1 ' Enable display
	GLCD_DATA = %11000000 : pulsout GLCD_E, 1 ' Scroll Position = 0
	gosub outputMatrix

	do

		for gen = 1 to 1 ' increase to 250 for accurate timing of generations
			gosub generateMatrix
		next
		gosub outputMatrix
		
	loop

outputMatrix:

	'Send matrix data for display on  GLCD
	
	low GLCD_RS
	GLCD_DATA = %01000000 : pulsout GLCD_E, 1 ' Y = 0
	GLCD_DATA = %10111000 : pulsout GLCD_E, 1 ' X = 0
	high GLCD_RS
	for row = Row00 to Row15 step 2
		peek row, GLCD_DATA : pulsout GLCD_E, 1
	next

	low GLCD_RS
	GLCD_DATA = %01000000 : pulsout GLCD_E, 1 ' Y = 0
	GLCD_DATA = %10111001 : pulsout GLCD_E, 1 ' X = 1
	high GLCD_RS
	for row = Row00b to Row15b step 2
		peek row, GLCD_DATA : pulsout GLCD_E, 1
	next
	
	return
	
initMatrix:

	'Set up initial cells in matrix
	w0 = %0000000000000000 : poke Row00, word w0
	w0 = %0000000000000000 : poke Row01, word w0
	w0 = %0000000000000000 : poke Row02, word w0
	w0 = %0000000000000000 : poke Row03, word w0
	w0 = %0000000000000000 : poke Row04, word w0
	w0 = %0000000000000000 : poke Row05, word w0
	w0 = %0000000000111000 : poke Row06, word w0
	w0 = %0000000000001000 : poke Row07, word w0
	w0 = %0000000000010000 : poke Row08, word w0
	w0 = %0000000000000000 : poke Row09, word w0
	w0 = %0000000000000000 : poke Row10, word w0
	w0 = %0011100000000000 : poke Row11, word w0
	w0 = %0010000000000000 : poke Row12, word w0
	w0 = %0001000000000000 : poke Row13, word w0
	w0 = %0000000000000000 : poke Row14, word w0
	w0 = %0000000000000000 : poke Row15, word w0
	return
	
generateMatrix:

	'set up N, NW, NE, W & E neighbour data
	peek Row15, word w0
	NeighbourN = w0
	bit16 = bit0 : w0 = w0 >> 1 : bit15 = bit16 : NeighbourNW = w0
	w0 = NeighbourN
	bit16 = bit15 : w0 = w0 << 1 : bit0 = bit16 : NeighbourNE = w0
	
	peek Row00, word w0
	poke Row00copy, word w0 	'copy row 0 to location after row 15 to remove need for wrap-around code in the loop	
	CurrCells = w0
	bit16 = bit0 : w0 = w0 >> 1 : bit15 = bit16 : NeighbourW = w0
	w0 = CurrCells
	bit16 = bit15 : w0 = w0 << 1 : bit0 = bit16 : NeighbourE = w0
	
	'Process each row
	for row = Row00 to Row15 step 2
		
		'Pick up new S, SW & SE neighbours
		rowS = row + 2
		peek rowS, word NeighbourS
		w0 = NeighbourS
		bit16 = bit0 : w0 = w0 >> 1 : bit15 = bit16 : NeighbourSW = w0
		w0 = NeighbourS
		bit16 = bit15 : w0 = w0 << 1 : bit0 = bit16 : NeighbourSE = w0
		
		'Any live cells at all in this region?
		w0 = CurrCells | NeighbourN | NeighbourS '  | NeighbourNW | NeighbourNE | NeighbourW | NeighbourE | NeighbourSW | NeighbourSE  (not needed for 16x16 grid)

		if w0 > 0 then
		
			'Count the live neighbours (in parallel) for the 16 current cells
			'However, if total goes over 3, we don't care (see below), so counting stops at 4
			tot1 = NeighbourN
			tot2 = tot1 & NeighbourNW : tot1 = tot1 ^ NeighbourNW
			carry = tot1 & NeighbourNE : tot1 = tot1 ^ NeighbourNE : tot4 = tot2 & carry : tot2 = tot2 ^ carry
			carry = tot1 & NeighbourW : tot1 = tot1 ^ NeighbourW : tot4 = tot2 & carry | tot4 : tot2 = tot2 ^ carry
			carry = tot1 & NeighbourE : tot1 = tot1 ^ NeighbourE : tot4 = tot2 & carry | tot4 : tot2 = tot2 ^ carry
			carry = tot1 & NeighbourS : tot1 = tot1 ^ NeighbourS : tot4 = tot2 & carry | tot4 : tot2 = tot2 ^ carry
			carry = tot1 & NeighbourSW : tot1 = tot1 ^ NeighbourSW : tot4 = tot2 & carry | tot4 : tot2 = tot2 ^ carry
			carry = tot1 & NeighbourSE : tot1 = tot1 ^ NeighbourSE : tot4 = tot2 & carry | tot4 : tot2 = tot2 ^ carry
		
			'Calculate the updated cells:
			' <2 or >3 neighbours, cell dies
			' =2 neighbours, cell continues to live
			' =3 neighbours, new cell born
			w0 = CurrCells | tot1 & tot2 &/ tot4
			poke row, word w0
			
		end if
		
		'Current cells (before update), E , W, SE, SW and S neighbours become new N, NW, NE, E, W neighbours and current cells for next loop
		NeighbourN = CurrCells : NeighbourNW = NeighbourW : NeighbourNE = NeighbourE
		NeighbourE = NeighbourSE : NeighbourW = NeighbourSW : CurrCells = NeighbourS 
	Next
	
	inc generation 
	
	return

end
 

PaulRB

Senior Member
I have now upped the grid size to 16x32 and increased it to fill the LCD screen. Back up to 64MHz. Screen update probably taking as long as the matrix generation.

[video=youtube_share;6ctRqBHvPEQ]http://youtu.be/6ctRqBHvPEQ[/video]

Can anyone suggest any improvements to the latest code please:

Code:
#picaxe 20x2

' Conway's Game Of Life 16x32
' PaulRB
' Jun 2013

' Cell data locations in scratchpad
symbol Row00 = 0
symbol Row00b = 1
symbol Row01 = 2
symbol Row02 = 4
symbol Row03 = 6
symbol Row04 = 8
symbol Row05 = 10
symbol Row06 = 12
symbol Row07 = 14
symbol Row08 = 16
symbol Row09 = 18
symbol Row10 = 20
symbol Row11 = 22
symbol Row12 = 24
symbol Row13 = 26
symbol Row14 = 28
symbol Row15 = 30
symbol Row15b = 31
symbol Row16 = 32
symbol Row16b = 33
symbol Row17= 34
symbol Row18 = 36
symbol Row19 = 38
symbol Row20 = 40
symbol Row21 = 42
symbol Row22 = 44
symbol Row23 = 46
symbol Row24 = 48
symbol Row25 = 50
symbol Row26 = 52
symbol Row27 = 54
symbol Row28 = 56
symbol Row29 = 58
symbol Row30 = 60
symbol Row31 = 62
symbol Row31b = 63
symbol Row00copy = 64

'Variables holding data on neighbouring cells
symbol NeighbourN = w14
symbol NeighbourNW = w15
symbol NeighbourNE = w16
symbol CurrCells = w17
symbol NeighbourW = w18
symbol NeighbourE = w19
symbol NeighbourS = w20
symbol NeighbourSW = w21
symbol NeighbourSE = w22

'Variables used in calculating new cells
symbol tot1 = w8
symbol carry = w9
symbol tot2 = w10
symbol tot4 = w12

'General variables
symbol row = b4
symbol RowS = b5
symbol x = b6 
symbol y = b7

'Pins controlling KS0108 Graphic LCD
symbol GLCD_RS = C.0 ' H = data, L = instruction
symbol GLCD_E = C.1
symbol GLCD_DATA = pinsB
symbol GLCD_D0 = pinB.0
symbol GLCD_D1 = pinB.1
symbol GLCD_D2 = pinB.2
symbol GLCD_D3 = pinB.3
symbol GLCD_D4 = pinB.4
symbol GLCD_D5 = pinB.5
symbol GLCD_D6 = pinB.6
symbol GLCD_D7 = pinB.7
symbol GLCD_CS1 = C.2
symbol GLCD_CS2 = C.3


main:

	setfreq m64
	gosub initMatrix
	
	dirsB = %11111111
	high GLCD_CS1
	high GLCD_CS2

	low GLCD_RS
	GLCD_DATA = %00111111 : pulsout GLCD_E, 1 ' Enable display
	GLCD_DATA = %11000000 : pulsout GLCD_E, 1 ' Scroll Position = 0

	do
		gosub generateMatrix
		gosub outputMatrix
	loop

outputMatrix:

	'Send matrix data for display on  GLCD
	for x = 0 to 7

		high GLCD_CS1 : high GLCD_CS2
		low GLCD_RS
		GLCD_DATA = %01000000 : pulsout GLCD_E, 1 ' Y = 0
		GLCD_DATA = %10111000  | x : pulsout GLCD_E, 1 ' set X
		high GLCD_RS

		high GLCD_CS1 : low GLCD_CS2
		for row = Row00 to Row31 step 2
			get row, word w0
			w0 = w0 >> x
			w0 = w0 >> x
			GLCD_DATA = 0
			GLCD_D0 = bit0 : GLCD_D1 = bit0 : GLCD_D2 = bit0 : GLCD_D3 = bit0
			GLCD_D4 = bit1: GLCD_D5 = bit1 : GLCD_D6 = bit1 : GLCD_D7 = bit1
			pulsout GLCD_E, 1 : pulsout GLCD_E, 1 : pulsout GLCD_E, 1 : pulsout GLCD_E, 1
			if row = Row15 then
				low GLCD_CS1 : high GLCD_CS2
			endif
		next
		
	next

	return
	
initMatrix:

	'Set up initial cells in matrix
	w0 = %0000000000000000 : put Row00, word w0
	w0 = %0000000000000000 : put Row01, word w0
	w0 = %0000000000000000 : put Row02, word w0
	w0 = %0000000000000000 : put Row03, word w0
	w0 = %0000000000000000 : put Row04, word w0
	w0 = %0000000000000000 : put Row05, word w0
	w0 = %0000000000111000 : put Row06, word w0
	w0 = %0000000000001000 : put Row07, word w0
	w0 = %0000000000010000 : put Row08, word w0
	w0 = %0000000000000000 : put Row09, word w0
	w0 = %0000000000000000 : put Row10, word w0
	w0 = %0011100000000000 : put Row11, word w0
	w0 = %0010000000000000 : put Row12, word w0
	w0 = %0001000000000000 : put Row13, word w0
	w0 = %0000000000000000 : put Row14, word w0
	w0 = %0000000000000000 : put Row15, word w0
	w0 = %0000000000000000 : put Row16, word w0
	w0 = %0000000011000000 : put Row17, word w0
	w0 = %0000000011000000 : put Row18, word w0
	w0 = %0000000000000000 : put Row19, word w0
	w0 = %0000000000000000 : put Row20, word w0
	w0 = %0000000000000000 : put Row21, word w0
	w0 = %0000000000000000 : put Row22, word w0
	w0 = %0000000000000000 : put Row23, word w0
	w0 = %0000000000000000 : put Row24, word w0
	w0 = %0000000000000000 : put Row25, word w0
	w0 = %0000000000000000 : put Row26, word w0
	w0 = %0000000000000000 : put Row27, word w0
	w0 = %0000111000000000 : put Row28, word w0
	w0 = %0000001000000000 : put Row29, word w0
	w0 = %0000111000000000 : put Row30, word w0
	w0 = %0000000000000000 : put Row31, word w0
	return
	
generateMatrix:

	'set up N, NW, NE, W & E neighbour data
	get Row31, word w0
	NeighbourN = w0
	bit16 = bit0 : w0 = w0 >> 1 : bit15 = bit16 : NeighbourNW = w0
	w0 = NeighbourN
	bit16 = bit15 : w0 = w0 << 1 : bit0 = bit16 : NeighbourNE = w0
	
	get Row00, word w0
	put Row00copy, word w0 	'copy row 0 to location after row 15 to remove need for wrap-around code in the loop	
	CurrCells = w0
	bit16 = bit0 : w0 = w0 >> 1 : bit15 = bit16 : NeighbourW = w0
	w0 = CurrCells
	bit16 = bit15 : w0 = w0 << 1 : bit0 = bit16 : NeighbourE = w0
	
	'Process each row
	for row = Row00 to Row31 step 2
		
		'Pick up new S, SW & SE neighbours
		rowS = row + 2
		get rowS, word NeighbourS
		w0 = NeighbourS
		bit16 = bit0 : w0 = w0 >> 1 : bit15 = bit16 : NeighbourSW = w0
		w0 = NeighbourS
		bit16 = bit15 : w0 = w0 << 1 : bit0 = bit16 : NeighbourSE = w0
		
		'Any live cells at all in this region?
		w0 = CurrCells | NeighbourN | NeighbourS '  | NeighbourNW | NeighbourNE | NeighbourW | NeighbourE | NeighbourSW | NeighbourSE  (not needed for 16x16 grid)

		if w0 > 0 then
		
			'Count the live neighbours (in parallel) for the 16 current cells
			'However, if total goes over 3, we don't care (see below), so counting stops at 4
			tot1 = NeighbourN
			tot2 = tot1 & NeighbourNW : tot1 = tot1 ^ NeighbourNW
			carry = tot1 & NeighbourNE : tot1 = tot1 ^ NeighbourNE : tot4 = tot2 & carry : tot2 = tot2 ^ carry
			carry = tot1 & NeighbourW : tot1 = tot1 ^ NeighbourW : tot4 = tot2 & carry | tot4 : tot2 = tot2 ^ carry
			carry = tot1 & NeighbourE : tot1 = tot1 ^ NeighbourE : tot4 = tot2 & carry | tot4 : tot2 = tot2 ^ carry
			carry = tot1 & NeighbourS : tot1 = tot1 ^ NeighbourS : tot4 = tot2 & carry | tot4 : tot2 = tot2 ^ carry
			carry = tot1 & NeighbourSW : tot1 = tot1 ^ NeighbourSW : tot4 = tot2 & carry | tot4 : tot2 = tot2 ^ carry
			carry = tot1 & NeighbourSE : tot1 = tot1 ^ NeighbourSE : tot4 = tot2 & carry | tot4 : tot2 = tot2 ^ carry
		
			'Calculate the updated cells:
			' <2 or >3 neighbours, cell dies
			' =2 neighbours, cell continues to live
			' =3 neighbours, new cell born
			w0 = CurrCells | tot1 & tot2 &/ tot4
			put row, word w0
			
		end if
		
		'Current cells (before update), E , W, SE, SW and S neighbours become new N, NW, NE, E, W neighbours and current cells for next loop
		NeighbourN = CurrCells : NeighbourNW = NeighbourW : NeighbourNE = NeighbourE
		NeighbourE = NeighbourSE : NeighbourW = NeighbourSW : CurrCells = NeighbourS 
	Next
	
	return

end
thanks,

Paul
 

boriz

Senior Member
Lookin' good. Maybe add a random mutation once in a while so that the display is always busy. Otherwise it'll quickly become stable and uninteresting. (Mutation = flip one random cell at random interval)
 

Buzby

Senior Member
That's looking really good !.

You asked for improvements to the code.
I had already spotted you were using vertical counters.
Many years ago I used them in PLC code for debouncing, so I revisited them today, and found a way to make them faster !

Unfortunately, like the enhancements to my 'track the neighbours' algorithm, I think the overheads will exceed the returns in such a small grid.
( If you want to try, search for 'DeBruijn sequences'. )

Maybe your best line of attack is to improve the outputMatrix: routine.

I've given up trying to speed up my algorithm, all the techniques I found need oodles of RAM.
( One method packs 3 cells into 16 bits, including current and next generation states, neighbour counts, and an edge flag ! )

Anyway, we have answered my original question, and proved a PICAXE can do the job on a smallish grid, so what's next ?

Cheers,

Buzby
 

PaulRB

Senior Member
Lookin' good. Maybe add a random mutation once in a while so that the display is always busy. Otherwise it'll quickly become stable and uninteresting. (Mutation = flip one random cell at random interval)
Thanks Boriz, I was thinking about adding a pushbutton for that...
 

PaulRB

Senior Member
search for 'DeBruijn sequences'.
I'll do that.

Maybe your best line of attack is to improve the outputMatrix: routine.
Agreed, I threw that together this evening. Needs "finessing", but in all honesty I can't possibly expect to get more than a few 10's of % speed improvement.

so what's next ?
Well, your 16x16 led matrix is on its way. You need to think out your multi/charlieplexing strategy. What are your thoughts? It's going to look so much more "bling" than my lcd!

As for me, I would still very much like to get to 1 cell = 1 pixel. The bigger the grid the more interesting the patterns Conway's rules produce.

There have been many discussions about "horses for courses" on this forum. This horse has certainly exceeded my expectations, especially given that this is not its ideal course! And to continue in the over-used analogies vein, there's currently only one tool in my toolbox! This is one of those rare occasions where processing speed is more important than ease of use and ease of development. You probably know what I'm thinking of, and they both begin with "A".

So I need advice form the forum members who have experience of their use (but have certainly not turned their backs on Picaxe, for often discussed reasons). My thoughts so far have been:

Nano/Micro (I like breadboards much more than dev boards)
Teensy 3.0 (wow what a spec! Overkill?)
ATMega328 with A* bootloader (as cheap as a Picaxe. For convenience more than economy, would like to use axe027 to program but know I need to invert the tx & rx. Simple transistor inverters?)

Guidance appreciated!
 
Last edited:

JimPerry

Senior Member
It's peaceful in Yorkshire - at my new abode in Scotland I wake to the sound of gulls stomping on the roof :cool:
 

PaulRB

Senior Member
Tidied the outputMatrix subroutine a bit - some improvement.
Code:
#picaxe 20x2

' Conway's Game Of Life 16x32
' PaulRB
' Jun 2013

' Cell data locations in scratchpad
symbol Row00 = 0
symbol Row00b = 1
symbol Row01 = 2
symbol Row02 = 4
symbol Row03 = 6
symbol Row04 = 8
symbol Row05 = 10
symbol Row06 = 12
symbol Row07 = 14
symbol Row08 = 16
symbol Row09 = 18
symbol Row10 = 20
symbol Row11 = 22
symbol Row12 = 24
symbol Row13 = 26
symbol Row14 = 28
symbol Row15 = 30
symbol Row15b = 31
symbol Row16 = 32
symbol Row16b = 33
symbol Row17= 34
symbol Row18 = 36
symbol Row19 = 38
symbol Row20 = 40
symbol Row21 = 42
symbol Row22 = 44
symbol Row23 = 46
symbol Row24 = 48
symbol Row25 = 50
symbol Row26 = 52
symbol Row27 = 54
symbol Row28 = 56
symbol Row29 = 58
symbol Row30 = 60
symbol Row31 = 62
symbol Row31b = 63
symbol Row00copy = 64

'Variables holding data on neighbouring cells
symbol NeighbourN = w14
symbol NeighbourNW = w15
symbol NeighbourNE = w16
symbol CurrCells = w17
symbol NeighbourW = w18
symbol NeighbourE = w19
symbol NeighbourS = w20
symbol NeighbourSW = w21
symbol NeighbourSE = w22

'Variables used in calculating new cells
symbol tot1 = w8
symbol carry = w9
symbol tot2 = w10
symbol tot4 = w12

'General variables
symbol row = b4
symbol RowS = b5
symbol x = b6 
symbol x2 = b7

'Pins controlling KS0108 Graphic LCD
symbol GLCD_RS = C.0 ' H = data, L = instruction
symbol GLCD_E = C.1
symbol GLCD_DATA = pinsB
symbol GLCD_D0 = pinB.0
symbol GLCD_D1 = pinB.1
symbol GLCD_D2 = pinB.2
symbol GLCD_D3 = pinB.3
symbol GLCD_D4 = pinB.4
symbol GLCD_D5 = pinB.5
symbol GLCD_D6 = pinB.6
symbol GLCD_D7 = pinB.7
symbol GLCD_CS1 = C.2
symbol GLCD_CS2 = C.3


'Bit data for screen refresh
data 0, (%00000000, %00001111, %11110000, %11111111)


main:

	setfreq m64
	gosub initMatrix
	
	dirsB = %11111111
	high GLCD_CS1
	high GLCD_CS2

	low GLCD_RS
	GLCD_DATA = %00111111 : pulsout GLCD_E, 1 ' Enable display
	GLCD_DATA = %11000000 : pulsout GLCD_E, 1 ' Scroll Position = 0

	do
		gosub generateMatrix
		gosub outputMatrix
	loop

outputMatrix:

	'Send matrix data for display on  GLCD
	for x = 0 to 7

		high GLCD_CS1 : high GLCD_CS2
		low GLCD_RS
		GLCD_DATA = %01000000 : pulsout GLCD_E, 1 ' Y = 0
		GLCD_DATA = %10111000  | x : pulsout GLCD_E, 1 ' set X
		high GLCD_RS
		x2 = x + x

		high GLCD_CS1 : low GLCD_CS2
		for row = Row00 to Row31 step 2
			get row, word w0
			w0 = w0 >> x2 & %00000011
			read b0, GLCD_DATA
			pulsout GLCD_E, 1 : pulsout GLCD_E, 1 : pulsout GLCD_E, 1 : pulsout GLCD_E, 1
			if row = Row15 then
				low GLCD_CS1 : high GLCD_CS2
			endif
		next
		
	next

	return
	
initMatrix:

	'Set up initial cells in matrix
	w0 = %0000000000000000 : put Row00, word w0
	w0 = %0000000000000000 : put Row01, word w0
	w0 = %0000000000000000 : put Row02, word w0
	w0 = %0000000000000000 : put Row03, word w0
	w0 = %0000000000000000 : put Row04, word w0
	w0 = %0000000000000000 : put Row05, word w0
	w0 = %0000000000111000 : put Row06, word w0
	w0 = %0000000000001000 : put Row07, word w0
	w0 = %0000000000010000 : put Row08, word w0
	w0 = %0000000000000000 : put Row09, word w0
	w0 = %0000000000000000 : put Row10, word w0
	w0 = %0011100000000000 : put Row11, word w0
	w0 = %0010000000000000 : put Row12, word w0
	w0 = %0001000000000000 : put Row13, word w0
	w0 = %0000000000000000 : put Row14, word w0
	w0 = %0000000000000000 : put Row15, word w0
	w0 = %0000000000000000 : put Row16, word w0
	w0 = %0000000111000000 : put Row17, word w0
	w0 = %0000000101000000 : put Row18, word w0
	w0 = %0000000101000000 : put Row19, word w0
	w0 = %0000000000000000 : put Row20, word w0
	w0 = %0000000000000000 : put Row21, word w0
	w0 = %0000000000000000 : put Row22, word w0
	w0 = %0000000000000000 : put Row23, word w0
	w0 = %0000000000000000 : put Row24, word w0
	w0 = %0000000000000000 : put Row25, word w0
	w0 = %0000000000000000 : put Row26, word w0
	w0 = %0000000000000000 : put Row27, word w0
	w0 = %0000111000000000 : put Row28, word w0
	w0 = %0000001000000000 : put Row29, word w0
	w0 = %0000111000000000 : put Row30, word w0
	w0 = %0000000000000000 : put Row31, word w0
	return
	
generateMatrix:

	'set up N, NW, NE, W & E neighbour data
	get Row31, word w0
	NeighbourN = w0
	bit16 = bit0 : w0 = w0 >> 1 : bit15 = bit16 : NeighbourNW = w0
	w0 = NeighbourN
	bit16 = bit15 : w0 = w0 << 1 : bit0 = bit16 : NeighbourNE = w0
	
	get Row00, word w0
	put Row00copy, word w0 	'copy row 0 to location after row 15 to remove need for wrap-around code in the loop	
	CurrCells = w0
	bit16 = bit0 : w0 = w0 >> 1 : bit15 = bit16 : NeighbourW = w0
	w0 = CurrCells
	bit16 = bit15 : w0 = w0 << 1 : bit0 = bit16 : NeighbourE = w0
	
	'Process each row
	for row = Row00 to Row31 step 2
		
		'Pick up new S, SW & SE neighbours
		rowS = row + 2
		get rowS, word NeighbourS
		w0 = NeighbourS
		bit16 = bit0 : w0 = w0 >> 1 : bit15 = bit16 : NeighbourSW = w0
		w0 = NeighbourS
		bit16 = bit15 : w0 = w0 << 1 : bit0 = bit16 : NeighbourSE = w0
		
		'Any live cells at all in this region?
		w0 = CurrCells | NeighbourN | NeighbourS '  | NeighbourNW | NeighbourNE | NeighbourW | NeighbourE | NeighbourSW | NeighbourSE  (not needed for 16x16 grid)

		if w0 > 0 then
		
			'Count the live neighbours (in parallel) for the 16 current cells
			'However, if total goes over 3, we don't care (see below), so counting stops at 4
			tot1 = NeighbourN
			tot2 = tot1 & NeighbourNW : tot1 = tot1 ^ NeighbourNW
			carry = tot1 & NeighbourNE : tot1 = tot1 ^ NeighbourNE : tot4 = tot2 & carry : tot2 = tot2 ^ carry
			carry = tot1 & NeighbourW : tot1 = tot1 ^ NeighbourW : tot4 = tot2 & carry | tot4 : tot2 = tot2 ^ carry
			carry = tot1 & NeighbourE : tot1 = tot1 ^ NeighbourE : tot4 = tot2 & carry | tot4 : tot2 = tot2 ^ carry
			carry = tot1 & NeighbourS : tot1 = tot1 ^ NeighbourS : tot4 = tot2 & carry | tot4 : tot2 = tot2 ^ carry
			carry = tot1 & NeighbourSW : tot1 = tot1 ^ NeighbourSW : tot4 = tot2 & carry | tot4 : tot2 = tot2 ^ carry
			carry = tot1 & NeighbourSE : tot1 = tot1 ^ NeighbourSE : tot4 = tot2 & carry | tot4 : tot2 = tot2 ^ carry
		
			'Calculate the updated cells:
			' <2 or >3 neighbours, cell dies
			' =2 neighbours, cell continues to live
			' =3 neighbours, new cell born
			w0 = CurrCells | tot1 & tot2 &/ tot4
			put row, word w0
			
		end if
		
		'Current cells (before update), E , W, SE, SW and S neighbours become new N, NW, NE, E, W neighbours and current cells for next loop
		NeighbourN = CurrCells : NeighbourNW = NeighbourW : NeighbourNE = NeighbourE
		NeighbourE = NeighbourSE : NeighbourW = NeighbourSW : CurrCells = NeighbourS 
	Next
	
	return

end
Now I need that advice from the Arduino/AVR people here, to help me make hardware purchase decisions. If I re-code the above in C/C++ and run it on a atmega328 @ 16Mhz, what relative performance gain am I likely to get? I need at least a 16-fold improvement.

Thanks,

Paul
 
Last edited:

PaulRB

Senior Member
If you are buying from new and want fasted speed use a Uno32 @ 80MHz

http://www.digilentinc.com/Products/Detail.cfm?Prod=CHIPKIT-UNO32
Thanks for the link, Technical. Won't fit on my breadboards though. How do you think that would perform relative to the Teensy 3.0 I mentioned earlier? The Teensy has a 48MHz clock vs the 80MHz of that uno32, but it has an ARM-based core, so you can't directly compare clock speeds for a reliable indication. The Teensy is also less than half the price.

http://www.pjrc.com/teensy/index.html

But both Teensy 3.0 and Uno32 may be overkill to achieve the 128x64 grid @ 2~5 updates per second I would like to achieve.
 
Last edited:

Buzby

Senior Member
... the Teensy 3.0 I mentioned earlier? The Teensy has a 48MHz clock vs the 80MHz of that uno32, but it has an ARM-based core, so you can't directly compare clock speeds for a reliable indication.
Link from the Teensy projects page, a Teensy running Life, and generating the video for display on a monitor : http://www.pjrc.com/teensy/projects/game_of_life.html
There is another project with a re-purposed shop window scrolling LED display, and a few other Life implementations.

The 'A' Team will definitely cut the mustard !
 

PaulRB

Senior Member
The 'A' Team will definitely cut the mustard !
Great, but which one? Teensy 2.0 and ++ are 8 bit, 16MHz avr processors, but 3.0 is a 32 bit, 48MHz arm. If the 8 bit ones are fast enough, then an atmega328 costing £2 will do the job.
 

hippy

Ex-Staff (retired)
Code:
		high GLCD_CS1 : low GLCD_CS2
		for row = Row00 to Row31 step 2
			get row, word w0
			w0 = w0 >> x2 & %00000011
			read b0, GLCD_DATA
			pulsout GLCD_E, 1 : pulsout GLCD_E, 1 : pulsout GLCD_E, 1 : pulsout GLCD_E, 1
			if row = Row15 then
				low GLCD_CS1 : high GLCD_CS2
			endif
		next
There might be a reasonable gain in splitting that FOR-NEXT into two loops and avoiding the IF every loop ...

Code:
		high GLCD_CS1 : low GLCD_CS2
		for row = Row00 to Row15 step 2
			get row, word w0
			w0 = w0 >> x2 & %00000011
			read b0, GLCD_DATA
			pulsout GLCD_E, 1 : pulsout GLCD_E, 1 : pulsout GLCD_E, 1 : pulsout GLCD_E, 1
		next
		low GLCD_CS1 : high GLCD_CS2
		for row = Row16 to Row31 step 2
			get row, word w0
			w0 = w0 >> x2 & %00000011
			read b0, GLCD_DATA
			pulsout GLCD_E, 1 : pulsout GLCD_E, 1 : pulsout GLCD_E, 1 : pulsout GLCD_E, 1
		next
I would probably unroll even further so you can avoid "get row, word w0" and wasted overhead there; just use an appropriate byte variable GET into b0 or b1.

I would also expect there's some good improvements to be had in counting the neighbouring cells. There seems to be a lot of work going on in what you currently have, though perhaps not optimisation critical.
 

PaulRB

Senior Member
Now with button to inject glider into a "stale" matrix...

[video=youtube_share;RJtykQ6a8-8]http://youtu.be/RJtykQ6a8-8[/video]

Code:
#picaxe 20x2

' Conway's Game Of Life 16x32
' PaulRB
' Jun 2013

' Cell data locations in scratchpad
symbol Row00 = 0
symbol Row00b = 1
symbol Row01 = 2
symbol Row02 = 4
symbol Row03 = 6
symbol Row04 = 8
symbol Row05 = 10
symbol Row06 = 12
symbol Row07 = 14
symbol Row08 = 16
symbol Row09 = 18
symbol Row10 = 20
symbol Row11 = 22
symbol Row12 = 24
symbol Row13 = 26
symbol Row14 = 28
symbol Row15 = 30
symbol Row15b = 31
symbol Row16 = 32
symbol Row16b = 33
symbol Row17= 34
symbol Row18 = 36
symbol Row19 = 38
symbol Row20 = 40
symbol Row21 = 42
symbol Row22 = 44
symbol Row23 = 46
symbol Row24 = 48
symbol Row25 = 50
symbol Row26 = 52
symbol Row27 = 54
symbol Row28 = 56
symbol Row29 = 58
symbol Row30 = 60
symbol Row31 = 62
symbol Row31b = 63
symbol Row00copy = 64

'Variables holding data on neighbouring cells
symbol NeighbourN = w14
symbol NeighbourNW = w15
symbol NeighbourNE = w16
symbol CurrCells = w17
symbol NeighbourW = w18
symbol NeighbourE = w19
symbol NeighbourS = w20
symbol NeighbourSW = w21
symbol NeighbourSE = w22

'Variables used in calculating new cells
symbol tot1 = w8
symbol carry = w9
symbol tot2 = w10
symbol tot4 = w12

'General variables
symbol row = b4
symbol RowS = b5
symbol x = b6 
symbol x2 = b7

'Pins controlling KS0108 Graphic LCD
symbol GLCD_RS = C.0 ' H = data, L = instruction
symbol GLCD_E = C.1
symbol GLCD_DATA = pinsB
symbol GLCD_CS1 = C.2
symbol GLCD_CS2 = C.3

'Other pins
symbol PushBtn = pinC.6

'Bit data for screen refresh
data 0, (%00000000, %00001111, %11110000, %11111111)


main:

	setfreq m64
	gosub initMatrix
	
	dirsB = %11111111
	pullup %00000010 'Pullup on C.6
	high GLCD_CS1 : high GLCD_CS2

	low GLCD_RS
	GLCD_DATA = %00111111 : pulsout GLCD_E, 1 ' Enable display
	GLCD_DATA = %11000000 : pulsout GLCD_E, 1 ' Scroll Position = 0

	do
		gosub generateMatrix
		gosub outputMatrix
		If PushBtn = 0 then
			do loop until PushBtn = 1
			gosub matrixIsBoring
			end if
	loop

outputMatrix:

	'Send matrix data for display on  GLCD
	for x = 0 to 7

		high GLCD_CS1 : high GLCD_CS2
		low GLCD_RS
		GLCD_DATA = %01000000 : pulsout GLCD_E, 1 ' Y = 0
		GLCD_DATA = %10111000  | x : pulsout GLCD_E, 1 ' set X
		high GLCD_RS
		x2 = x + x

		high GLCD_CS1 : low GLCD_CS2
		for row = Row00 to Row31 step 2
			get row, word w0
			w0 = w0 >> x2 & %00000011
			read b0, GLCD_DATA
			pulsout GLCD_E, 1 : pulsout GLCD_E, 1 : pulsout GLCD_E, 1 : pulsout GLCD_E, 1
			if row = Row15 then
				low GLCD_CS1 : high GLCD_CS2
			endif
		next
		
	next

	return
	
initMatrix:

	'Set up initial cells in matrix
	w0 = %0000000000000000 : put Row00, word w0
	w0 = %0000000000000000 : put Row01, word w0
	w0 = %0000000000000000 : put Row02, word w0
	w0 = %0000000000000000 : put Row03, word w0
	w0 = %0000000000000000 : put Row04, word w0
	w0 = %0000000000000000 : put Row05, word w0
	w0 = %0000000000111000 : put Row06, word w0
	w0 = %0000000000001000 : put Row07, word w0
	w0 = %0000000000010000 : put Row08, word w0
	w0 = %0000000000000000 : put Row09, word w0
	w0 = %0000000000000000 : put Row10, word w0
	w0 = %0011100000000000 : put Row11, word w0
	w0 = %0010000000000000 : put Row12, word w0
	w0 = %0001000000000000 : put Row13, word w0
	w0 = %0000000000000000 : put Row14, word w0
	w0 = %0000000000000000 : put Row15, word w0
	w0 = %0000000000000000 : put Row16, word w0
	w0 = %0000000000000000 : put Row17, word w0
	w0 = %0000000000000000 : put Row18, word w0
	w0 = %0000000000000000 : put Row19, word w0
	w0 = %0000000000000000 : put Row20, word w0
	w0 = %0000000000000000 : put Row21, word w0
	w0 = %0000000000000000 : put Row22, word w0
	w0 = %0000000000000000 : put Row23, word w0
	w0 = %0000000000000000 : put Row24, word w0
	w0 = %0000000000000000 : put Row25, word w0
	w0 = %0000000000000000 : put Row26, word w0
	w0 = %0000000000000000 : put Row27, word w0
	w0 = %0000000000000000 : put Row28, word w0
	w0 = %0000000000000000 : put Row29, word w0
	w0 = %0000000000000000 : put Row30, word w0
	w0 = %0000000000000000 : put Row31, word w0
	return
	
matrixIsBoring:

	get Row00, word w0 : w0 = w0 ^ %0000000000000010 : put Row00, word w0
	get Row01, word w0 : w0 = w0 ^ %0000000000000100 : put Row01, word w0
	get Row02, word w0 : w0 = w0 ^ %0000000000000111 : put Row02, word w0

	return
	
generateMatrix:

	'set up N, NW, NE, W & E neighbour data
	get Row31, word w0
	NeighbourN = w0
	bit16 = bit0 : w0 = w0 >> 1 : bit15 = bit16 : NeighbourNW = w0
	w0 = NeighbourN
	bit16 = bit15 : w0 = w0 << 1 : bit0 = bit16 : NeighbourNE = w0
	
	get Row00, word w0
	put Row00copy, word w0 	'copy row 0 to location after row 15 to remove need for wrap-around code in the loop	
	CurrCells = w0
	bit16 = bit0 : w0 = w0 >> 1 : bit15 = bit16 : NeighbourW = w0
	w0 = CurrCells
	bit16 = bit15 : w0 = w0 << 1 : bit0 = bit16 : NeighbourE = w0
	
	'Process each row
	for row = Row00 to Row31 step 2
		
		'Pick up new S, SW & SE neighbours
		rowS = row + 2
		get rowS, word NeighbourS
		w0 = NeighbourS
		bit16 = bit0 : w0 = w0 >> 1 : bit15 = bit16 : NeighbourSW = w0
		w0 = NeighbourS
		bit16 = bit15 : w0 = w0 << 1 : bit0 = bit16 : NeighbourSE = w0
		
		'Any live cells at all in this region?
		w0 = CurrCells | NeighbourN | NeighbourS '  | NeighbourNW | NeighbourNE | NeighbourW | NeighbourE | NeighbourSW | NeighbourSE  (not needed for 16x16 grid)

		if w0 > 0 then
		
			'Count the live neighbours (in parallel) for the 16 current cells
			'However, if total goes over 3, we don't care (see below), so counting stops at 4
			tot1 = NeighbourN
			tot2 = tot1 & NeighbourNW : tot1 = tot1 ^ NeighbourNW
			carry = tot1 & NeighbourNE : tot1 = tot1 ^ NeighbourNE : tot4 = tot2 & carry : tot2 = tot2 ^ carry
			carry = tot1 & NeighbourW : tot1 = tot1 ^ NeighbourW : tot4 = tot2 & carry | tot4 : tot2 = tot2 ^ carry
			carry = tot1 & NeighbourE : tot1 = tot1 ^ NeighbourE : tot4 = tot2 & carry | tot4 : tot2 = tot2 ^ carry
			carry = tot1 & NeighbourS : tot1 = tot1 ^ NeighbourS : tot4 = tot2 & carry | tot4 : tot2 = tot2 ^ carry
			carry = tot1 & NeighbourSW : tot1 = tot1 ^ NeighbourSW : tot4 = tot2 & carry | tot4 : tot2 = tot2 ^ carry
			carry = tot1 & NeighbourSE : tot1 = tot1 ^ NeighbourSE : tot4 = tot2 & carry | tot4 : tot2 = tot2 ^ carry
		
			'Calculate the updated cells:
			' <2 or >3 neighbours, cell dies
			' =2 neighbours, cell continues to live
			' =3 neighbours, new cell born
			w0 = CurrCells | tot1 & tot2 &/ tot4
			put row, word w0
			
		end if
		
		'Current cells (before update), E , W, SE, SW and S neighbours become new N, NW, NE, E, W neighbours and current cells for next loop
		NeighbourN = CurrCells : NeighbourNW = NeighbourW : NeighbourNE = NeighbourE
		NeighbourE = NeighbourSE : NeighbourW = NeighbourSW : CurrCells = NeighbourS 
	Next
	
	return

end
 

PaulRB

Senior Member
There might be a reasonable gain in splitting that FOR-NEXT into two loops and avoiding the IF every loop ...

I would probably unroll even further so you can avoid "get row, word w0" and wasted overhead there; just use an appropriate byte variable GET into b0 or b1.
Thanks for those hippy,but they can only make a couple of % difference I think.

I would also expect there's some good improvements to be had in counting the neighbouring cells. There seems to be a lot of work going on in what you currently have, though perhaps not optimisation critical.
Well, I already spent time optimising that part and couldn't think of anything else. What did you have in mind, can you be more specific?

I have the thing running at about 4 generations per second now. No point going much faster, so the next thing to try is 4 x larger matrix. But I don't think that's achievable for 2 reasons: processing speed and memory. Can't squeeze another 400% improvements with minor code changes. Can't get any more ram. I2C would be too slow. Trying to use the display's memory by reading back from it, to the picaxe, would also be very slow and the data would then have to be turned back into cells packed into bits.

So here's my best plan at the moment. Buy a Teensy 3.0 and an ATMega328 (with crystal, doesn't need bootloader?). Initially, use the Teensy as the programmer for the AVR. If the ATMega turns out to be too slow, switch to the Teensy. If the ATMega is fast enough, use the Teensy for something else.
 

PaulRB

Senior Member
Hi Busby & Hippy, I thought I would let you know how I'm getting on.

I did not buy the Teensy 3.0. I just bought an Arduino Nano 3, half the price. This is based on ATMega328 @ 16MHz.

I have translated the 16x16 matrix Picaxe program from http://www.picaxeforum.co.uk/showthread.php?24019-Conway-s-Game-of-Life-has-anyone-done-it-with-a-PICAXE&p=239962&viewfull=1#post239962, which outputs to the terminal, into Arduino C sticking as closely as I can to the original. This should give the best possible speed comparison.

Remember, with a single glider on the matrix, I was getting around 30 generations per second from the Picaxe 20x2 @ 64MHz.

We all knew the Arduino was going to be a lot faster, despite the slower clock speed, because its C language is compiled vs. the Picaxe's interpreted basic. But how much faster?

Here's a video demonstration. In the first half of the video, the matrix is output one generation at a time, with a 1 second delay in the loop. You can see that the glider is moving correctly.

in the second half of the video, I removed the 1 second delay but increased the number of generations between each time the matrix is output to the terminal. I kept increasing the number of generations until the matrix was being output roughly once per second again.

[video=youtube_share;fcKjaKdSDqw]http://youtu.be/fcKjaKdSDqw[/video]

The Arduino is processing 10,000 generations per second. I can't believe it...

Code:
// Conway's Game Of Life 16x16
// PaulRB
// Jun 2013

word Matrix[17]; // Cell data in ram
long Generation = 0;

void setup() {
  
        Serial.begin(38400);
	initMatrix();
	outputMatrix();
	
}

void loop() {
  
        for (int gen = 0; gen < 10000; gen++) {
                generateMatrix();
        }
	outputMatrix();
        //delay(1000);
}

void outputMatrix() {
  
        Serial.print("Generation ");
        Serial.println(Generation);
	for (int row = 0; row <= 15; row++) {
              for (int col = 0; col <= 15; col++) {
                    Serial.print(bitRead(Matrix[row], col));
              }
              Serial.println();
        }
}
        
	
void initMatrix() {
  
	//Set up initial cells in matrix
	int row = 0;
	Matrix[row++] = B00000000 << 8 | B00000000;
	Matrix[row++] = B00000000 << 8 | B00000000;
	Matrix[row++] = B00000000 << 8 | B00000000;
	Matrix[row++] = B00000000 << 8 | B00000000;
	Matrix[row++] = B00000000 << 8 | B00000000;
	Matrix[row++] = B00000000 << 8 | B00000000;
	Matrix[row++] = B00000000 << 8 | B00000000;
	Matrix[row++] = B00000000 << 8 | B00000000;
	Matrix[row++] = B00000000 << 8 | B00000000;
	Matrix[row++] = B00000000 << 8 | B00000000;
	Matrix[row++] = B00000000 << 8 | B00000000;
	Matrix[row++] = B00000000 << 8 | B00000000;
	Matrix[row++] = B00001110 << 8 | B00000000;
	Matrix[row++] = B00000010 << 8 | B00000000;
	Matrix[row++] = B00001000 << 8 | B00000000;
	Matrix[row++] = B00000000 << 8 | B00000000;
}
	
void generateMatrix() {

	//Variables holding data on neighbouring cells
	word NeighbourN, NeighbourNW, NeighbourNE, CurrCells, NeighbourW, NeighbourE, NeighbourS, NeighbourSW, NeighbourSE;
	
	//Variables used in calculating new cells
	word tot1, carry, tot2, tot4;

	//set up N, NW, NE, W & E neighbour data
	NeighbourN = Matrix[15];
	NeighbourNW = NeighbourN >> 1; bitWrite(NeighbourNW, 15, bitRead(NeighbourN, 0));
	NeighbourNE = NeighbourN << 1; bitWrite(NeighbourNE, 0, bitRead(NeighbourN, 15));
	
	CurrCells = Matrix[0];
	NeighbourW = CurrCells >> 1; bitWrite(NeighbourW, 15, bitRead(CurrCells, 0));
	NeighbourE = CurrCells << 1; bitWrite(NeighbourE, 0, bitRead(CurrCells, 15));

	Matrix[16] = CurrCells;  // 'copy row 0 to location after last row to remove need for wrap-around code in the loop
	
	//Process each row of the matrix
	for (int row = 0; row <= 15; row++) {
		
		//Pick up new S, SW & SE neighbours
		NeighbourS = Matrix[row + 1];
		NeighbourSW = NeighbourS >> 1; bitWrite(NeighbourSW, 15, bitRead(NeighbourS, 0));
		NeighbourSE = NeighbourS << 1; bitWrite(NeighbourSE, 0, bitRead(NeighbourS, 15));

		//Any live cells at all in this region?
		if (CurrCells | NeighbourN | NeighbourS > 0)  {// | NeighbourNW | NeighbourNE | NeighbourW | NeighbourE | NeighbourSW | NeighbourSE  (not needed for 16x16 grid)
		
			//Count the live neighbours (in parallel) for the current row of cells
			//However, if total goes over 3, we don't care (see below), so counting stops at 4
			tot1 = NeighbourN;
			tot2 = tot1 & NeighbourNW; tot1 = tot1 ^ NeighbourNW;
			carry = tot1 & NeighbourNE; tot1 = tot1 ^ NeighbourNE; tot4 = tot2 & carry; tot2 = tot2 ^ carry;
			carry = tot1 & NeighbourW; tot1 = tot1 ^ NeighbourW; tot4 = tot2 & carry | tot4; tot2 = tot2 ^ carry;
			carry = tot1 & NeighbourE; tot1 = tot1 ^ NeighbourE; tot4 = tot2 & carry | tot4; tot2 = tot2 ^ carry;
			carry = tot1 & NeighbourS; tot1 = tot1 ^ NeighbourS; tot4 = tot2 & carry | tot4; tot2 = tot2 ^ carry;
			carry = tot1 & NeighbourSW; tot1 = tot1 ^ NeighbourSW; tot4 = tot2 & carry | tot4; tot2 = tot2 ^ carry;
			carry = tot1 & NeighbourSE; tot1 = tot1 ^ NeighbourSE; tot4 = tot2 & carry | tot4; tot2 = tot2 ^ carry;
		
			//Calculate the updated cells:
			// <2 or >3 neighbours, cell dies
			// =2 neighbours, cell continues to live
			// =3 neighbours, new cell born
			Matrix[row] = (CurrCells | tot1) & tot2 & ~ tot4;
			
		}
		
		//Current cells (before update), E , W, SE, SW and S neighbours become 
		//new N, NW, NE, E, W neighbours and current cells for next loop
		NeighbourN = CurrCells; NeighbourNW = NeighbourW; NeighbourNE = NeighbourE;
		NeighbourE = NeighbourSE; NeighbourW = NeighbourSW; CurrCells = NeighbourS;

	}
        Generation++;

}
Please believe me, I'm not trying to bash the Picaxe here. I think it has many advantages in its favour (ease of programming, price, helpful documentation and this forum). We all know that outright processing speed is not what Picaxe is about. But the contrast is startling all the same.
 
Last edited:

PaulRB

Senior Member
Finally I have achieved the full 128x64 matrix!

Runs at a good speed, too, even though this is my first Arduino program and it isn't optimised. But there are no delays in there, its going flat out.

[video=youtube_share;ufEALy9TryQ]http://youtu.be/ufEALy9TryQ[/video]

Code:
// Conway's Game Of Life 64x128
// PaulRB
// Jun 2013

//Pins controlling KS0108 Graphic LCD
const byte GLCD_RS = 10; // H = data, L = instruction
const byte GLCD_E = 8;
const byte GLCD_DATA[8] = {7, 6, 5, 4, 3, 2, 9, 11};
const byte GLCD_CS1 = 13; // left half of screen
const byte GLCD_CS2 = 12; // right half of screen

unsigned long Matrix[129][2]; // Cell data in ram

void setup() {
  
  initMatrix();
	
  pinMode(GLCD_CS1, OUTPUT);
  pinMode(GLCD_CS2, OUTPUT);
  pinMode(GLCD_RS, OUTPUT);
  pinMode(GLCD_E, OUTPUT);
  for (byte p = 0; p <= 7; p++) {
    pinMode(GLCD_DATA[p], OUTPUT);
    }

  digitalWrite(GLCD_CS1, HIGH);
  digitalWrite(GLCD_CS2, HIGH);
  digitalWrite(GLCD_RS, LOW);
  digitalWrite(GLCD_E, LOW);

  sendGLCD(B00111111); // Enable display
  sendGLCD(B11000000); // Set Scroll Position = 0
        
}

void loop() {
  generateMatrix();
  outputMatrix();
}

void sendGLCD(byte b) { //Send a byte to the GLCD

  digitalWrite(GLCD_E, HIGH);
  for (byte p = 0; p <= 7; p++) {
    digitalWrite(GLCD_DATA[p], bitRead(b, p));
  }
  digitalWrite(GLCD_E, LOW);

}

void outputMatrix() {
  
  //Send matrix data for display on GLCD
  for (byte x = 0; x <= 7; x++) {

  digitalWrite(GLCD_CS1, HIGH);
  digitalWrite(GLCD_CS2, HIGH);
  digitalWrite(GLCD_RS, LOW);
  sendGLCD(B01000000); // Set Y = 0
  sendGLCD(B10111000  | x); // set X
  digitalWrite(GLCD_RS, HIGH);

  digitalWrite(GLCD_CS1, HIGH);
  digitalWrite(GLCD_CS2, LOW);
  for (byte row = 0; row <= 127; row++) {
    byte pattern = (Matrix[row][x>>2] >> ((x & 3)<<3));
    sendGLCD(pattern);
    if (row == 63) {
      digitalWrite(GLCD_CS1, LOW);
      digitalWrite(GLCD_CS2, HIGH);
      }
    }	
  }
}
	
void initMatrix() {

  //Set up initial cells in matrix
  byte row = 0;
  Matrix[row++][0] = B11100000 << 8 | B00000000;
  Matrix[row++][0] = B10000000 << 8 | B00000000;
  Matrix[row++][0] = B11100000 << 8 | B00000000;
  Matrix[row++][0] = B00000000 << 8 | B00000000;
  Matrix[row++][0] = B00000000 << 8 | B00000000;
  Matrix[row++][0] = B00000000 << 8 | B00000000;
  Matrix[row++][0] = B00000000 << 8 | B00000000;
  Matrix[row++][0] = B00000000 << 8 | B00000000;
  Matrix[row++][0] = B00000000 << 8 | B00000000;
  Matrix[row++][0] = B00000000 << 8 | B00000000;
  Matrix[row++][0] = B00000000 << 8 | B00000000;
  Matrix[row++][0] = B00000000 << 8 | B00000000;
  Matrix[row++][0] = B00000000 << 8 | B00000000;
  Matrix[row++][0] = B00000000 << 8 | B00000000;
  Matrix[row++][0] = B00000000 << 8 | B00000000;
  Matrix[row++][0] = B00000000 << 8 | B00000111;
  Matrix[row++][0] = B00000000 << 8 | B00000001;
  Matrix[row++][0] = B00000000 << 8 | B00000111;
  Matrix[row++][0] = B00000000 << 8 | B00000000;
  Matrix[row++][0] = B00000000 << 8 | B00000000;
  Matrix[row++][0] = B00000000 << 8 | B00000000;
  Matrix[row++][0] = B00000000 << 8 | B00000000;
  Matrix[row++][0] = B00100000 << 8 | B00000000;
  Matrix[row++][0] = B01000000 << 8 | B00000000;
  Matrix[row++][0] = B01110000 << 8 | B00000000;
  Matrix[row++][0] = B00000000 << 8 | B00000000;
  Matrix[row++][0] = B00000000 << 8 | B00000000;
  Matrix[row++][0] = B00000000 << 8 | B00000000;
  Matrix[row++][0] = B00000000 << 8 | B00000000;
  Matrix[row++][0] = B00000000 << 8 | B00000000;
  Matrix[row++][0] = B00000000 << 8 | B00000000;
  Matrix[row++][0] = B00000000 << 8 | B00000000;
}
	
void generateMatrix() {

  //Variables holding data on neighbouring cells
  unsigned long NeighbourN[2], NeighbourNW[2], NeighbourNE[2], CurrCells[2], NeighbourW[2], NeighbourE[2], NeighbourS[2], NeighbourSW[2], NeighbourSE[2];
	
  //Variables used in calculating new cells
  unsigned long tot1[2], carry[2], tot2[2], tot4[2];

  //set up N, NW, NE, W & E neighbour data
  for (byte b = 0; b <= 1; b++) {
    NeighbourN[b] = Matrix[127][b];
    CurrCells[b] = Matrix[0][b];
  }

  for (byte b = 0; b <= 1; b++) {
    NeighbourNW[b] = NeighbourN[b] >> 1; 
    bitWrite(NeighbourNW[b], 31, bitRead(NeighbourN[1-b], 0));
    NeighbourNE[b] = NeighbourN[b] << 1;
    bitWrite(NeighbourNE[b], 0, bitRead(NeighbourN[1-b], 31));
	
    NeighbourW[b] = CurrCells[b] >> 1;
    bitWrite(NeighbourW[b], 31, bitRead(CurrCells[1-b], 0));
    NeighbourE[b] = CurrCells[b] << 1;
    bitWrite(NeighbourE[b], 0, bitRead(CurrCells[1-b], 31));

    Matrix[128][b] = CurrCells[b];  // copy row 0 to location after last row to remove need for wrap-around code in the loop
  }
  
  //Process each row of the matrix
  for (byte row = 0; row <= 127; row++) {
		
    //Pick up new S, SW & SE neighbours
    for (byte b = 0; b <= 1; b++) {
    NeighbourS[b] = Matrix[row + 1][b];
  }
  
  for (byte b = 0; b <= 1; b++) {
    NeighbourSW[b] = NeighbourS[b] >> 1;
    bitWrite(NeighbourSW[b], 31, bitRead(NeighbourS[1-b], 0));
    NeighbourSE[b] = NeighbourS[b] << 1;
    bitWrite(NeighbourSE[b], 0, bitRead(NeighbourS[1-b], 31));
  }
  
  //Any live cells at all in this region?
  if (CurrCells[0] | NeighbourN[0] | NeighbourS[0] | CurrCells[1] | NeighbourN[1] | NeighbourS[1] > 0) {
    
    //Count the live neighbours (in parallel) for the current row of cells
    //However, if total goes over 3, we don't care (see below), so counting stops at 4
    for (byte b = 0; b <= 1; b++) {
      tot1[b] = NeighbourN[b];
      tot2[b] = tot1[b] & NeighbourNW[b]; tot1[b] = tot1[b] ^ NeighbourNW[b];
      carry[b] = tot1[b] & NeighbourNE[b]; tot1[b] = tot1[b] ^ NeighbourNE[b]; tot4[b] = tot2[b] & carry[b]; tot2[b] = tot2[b] ^ carry[b];
      carry[b] = tot1[b] & NeighbourW[b]; tot1[b] = tot1[b] ^ NeighbourW[b]; tot4[b] = tot2[b] & carry[b] | tot4[b]; tot2[b] = tot2[b] ^ carry[b];
      carry[b] = tot1[b] & NeighbourE[b]; tot1[b] = tot1[b] ^ NeighbourE[b]; tot4[b] = tot2[b] & carry[b] | tot4[b]; tot2[b] = tot2[b] ^ carry[b];
      carry[b] = tot1[b] & NeighbourS[b]; tot1[b] = tot1[b] ^ NeighbourS[b]; tot4[b] = tot2[b] & carry[b] | tot4[b]; tot2[b] = tot2[b] ^ carry[b];
      carry[b] = tot1[b] & NeighbourSW[b]; tot1[b] = tot1[b] ^ NeighbourSW[b]; tot4[b] = tot2[b] & carry[b] | tot4[b]; tot2[b] = tot2[b] ^ carry[b];
      carry[b] = tot1[b] & NeighbourSE[b]; tot1[b] = tot1[b] ^ NeighbourSE[b]; tot4[b] = tot2[b] & carry[b] | tot4[b]; tot2[b] = tot2[b] ^ carry[b];
		
    //Calculate the updated cells:
    // <2 or >3 neighbours, cell dies
    // =2 neighbours, cell continues to live
    // =3 neighbours, new cell born
    Matrix[row][b] = (CurrCells[b] | tot1[b]) & tot2[b] & ~ tot4[b];
    }
  }
  
  //Current cells (before update), E , W, SE, SW and S neighbours become
  //new N, NW, NE, E, W neighbours and current cells for next loop
  for (byte b = 0; b <= 1; b++) {
    NeighbourN[b] = CurrCells[b];
    NeighbourNW[b] = NeighbourW[b];
    NeighbourNE[b] = NeighbourE[b];
    NeighbourE[b] = NeighbourSE[b];
    NeighbourW[b] = NeighbourSW[b];
    CurrCells[b] = NeighbourS[b];
    }
  }	
}
 

hippy

Ex-Staff (retired)
Yes, apologies for posting Arduino stuff, but you'll have to read the full thread from the start to see how and why.
That's fair enough but let's primarily stick to PICAXE on this forum. There are many processors which can beat PICAXE and each other on performance in particular tasks ( and often don't offer what the PICAXE does ), and there's no problem in noting that, but we aren't forums for those particular devices.
 

westaust55

Moderator
At the end of the day, this forum is "owned"/controlled by Revolution Education.
Their policy (keep it PICAXE related) is given in paragraphs 2 and 3 in the sticky thread at the top of the active forum area.
http://www.picaxeforum.co.uk/showthread.php?7679-Read-Me-First!

Rev Ed do take a fairly lenient line on a bit of clean humour and the odd reference to other microcontollers but there is a point where starting to promote other chips in lieu of PICAXE may not be tolerated.

Maybe (like some others here) I started decades ago with microprocessors and needed to cqrefully read datasheets (no forums, Internet or other interested folks for 2000 km. no general IO interface ports or serial/ic2/etc inbuilt. If I needed a port it involved maybe 10 chips many for the address decoding and data buffering. Also like others here learnt a handful of languages from machine code up - probably forgotten how to use most correctly these days.
Maybe I'm at second childhood?? Now I am happy to use the PICAXE for my projects and experiments - Rev Ed has done all the hard work for me on the control side and I can concentrate of the external chip I am trying to control/monitor.
For me the price/performance ratio is pretty go when I consider how much of my time (a limited resource) might otherwise be used using other micro-controllers/processors.
 

boriz

Senior Member
Why people insist on 'competitive comparisons' between apples an oranges is beyond me. Lest we forget, a PIC Microcontroller isn't a computer. I think RevEd, and the Gurus around here, have done a remarkable job pushing the humble PIC as far as they have.
 

mrburnette

Senior Member
... My vote is for the 'second childhood...'

I'll not argue because you are entirely correct West... It is their forum, which was clearly stated in my previous post. Also, it is all well that in your well-versed past you gained significant experience in microprocessors/microcontrollers; I too had such an early career. But, not everyone has had such growth or exposure. Some folks will never need to go beyond this forum, others will want to grow their knowledge and skills.

I have read several times that the cash-flow for PICAXE is education And that the hobby market is a small part of the business. IF it is ALL about education, then there should be little fear for comparisons. A broader discussion of market capabilities should not scare RevEd. Such discussions do NOT need to be in the main, active forum, but such discussions do not deserve to be deleted, IMO. Move 'em to an off-topic area.... Competitive Landscape ... This would even give Hippy/Technical an opportunity to chime-in on items that were misrepresented and set the record straight. Maybe RevEd would even find it interesting to understand why PICAXE users were finding limitations in the products. Open and objective thought sharing is preferable to closed and bias view of the landscape.

- Ray
 

mrburnette

Senior Member
<...> I think RevEd, and the Gurus around here, have done a remarkable job pushing the humble PIC as far as they have.
Yes, they certainly have, within the context of having a full interpreter within each chip. With the fickle old complex compilers we once had to use from the DOS-command line, the first PICAXE was surely a remarkable piece of assembly language. Times do change, however and the GUI tool sets of today hide most of the ugly.
 

Buzby

Senior Member
It's really just a project to try pushing the PICAXE envelope. If I specifically *needed* a fast Life engine I would not use PICAXE !
This is what I wrote in post #4, and it still stands.

Paul's Arduino Life is cool, but once we start leaving PICAXE and using bigger 'envelopes', then I don't think you can beat Golly.

Remember Marty's DMX ?. It was a challenge to even drive it from a PICAXE, and the final solution used two, and still was a 'clunky' solution.
In all reality if the aim was to build something fast and powerful, would PICAXE really be first choice ?.

Anyway, back to the thread, I don't think there is a way to make a faster Life engine with PICAXE.
Me, Paul, and hippy have exhausted the possibilities ( I think ), so my next challenge is make a different sort of Life display, LEDs and LCDs are so old hat !.

Cheers,

Buzby
 
Top