Why did my old system change?

saunj

Senior Member
I have a home logging system which has been running continuously since 2006, and the last modification was in 2010.
It has made nearly 6000 files to date. For details see http://projects.WorsleyAssociates.com/Environmental_Logger/
Well at 10:57:06 on Jan 24, 2014 it stopped recording.
I did not notice that the"SD" flag on the display was gone until mar 9, when I restarted it.
It has been functioning since, except that the exception code starting on line 464 on the attached code executes sporadically.
The test code at 190 always has an OK result.
No recording is lost.
I changed the Secure Digital memory to a freshly-formatted one and sprayed contact cleaner into its socket, but no change.
Can anyone tell me why/
 

Attachments

Goeytex

Senior Member
Hi Saunj

The <BadResponse> routine is called for one of two reasons.

1. Hserin timeout (no response in 300ms) or
2. The byte received is not equal to 62

Suggest you insert a small routine that determines if it is a timeout or an incorrect response. Have this routine send the error data to the display. If it is a timeout, send "Timeout error". If bad data byte, then Send ("Data error",Byte) where "byte" is the value of the byte received by hserin. You could also have this routine sound the alarm.

This should be a starting point.

What troubleshooting have you done? Anything? I would check the power supply(s) for proper voltage and noise before doing anything else.

Do you have a scope ? Logic analyzer ?
 

saunj

Senior Member
Hi Saunj

The <BadResponse> routine is called for one of two reasons.

1. Hserin timeout (no response in 300ms) or
2. The byte received is not equal to 62

Suggest you insert a small routine that determines if it is a timeout or an incorrect response. Have this routine send the error data to the display. If it is a timeout, send "Timeout error". If bad data byte, then Send ("Data error",Byte) where "byte" is the value of the byte received by hserin. You could also have this routine sound the alarm.

This should be a starting point.

What troubleshooting have you done? Anything? I would check the power supply(s) for proper voltage and noise before doing anything else.

Do you have a scope ? Logic analyzer ?
Thank you goeytex. Your conclusions and suggestions are valid and useful. The routine does sound an alarm, otherwise I would not be aware of it.
The unregulated and 12 volt supplies are part of the recording, and show no problems. I have rechecked them with my Fluke and they agree with the display.
The 5V is distributed. The one local to the recording circuit, which is made by Rogue Robotics, is also OK.
my problem in making diagnostic mods to the software is threefold.
1. I am reluctant to take the project offline since it is designed to recover from this error, and it does - there is nothing in the recording to show it occurred.
2. I think I know that the file (Oct 17, 2010) is the one installed, but I an nervous.
3. It only happens at intervals which may be hours apart, so I need to record the event rather than display it. I would use SERTXD, which is not in use.
I could trigger my Tek DSO in the code and look back to see if the timeout was exceeded. Maybe you can tell me if it is 300 or 150 milliseconds,
I am using HSERIN at 9600 baud, does this timeout value vary with baud rate?
 

srnet

Senior Member
I would use SERTXD, which is not in use.
I could trigger my Tek DSO in the code and look back to see if the timeout was exceeded. Maybe you can tell me if it is 300 or 150 milliseconds,
I am using HSERIN at 9600 baud, does this timeout value vary with baud rate?
Your setting the timeout in the HSERIN command, for instance its 1000mS in this command;

HSERIN [1000,BadResponse] .........................

And it does not vary with baud rate, and it would not make much sense for the command to vary the timeout in mS, depending on the baud rate.
 

Goeytex

Senior Member
3. It only happens at intervals which may be hours apart, so I need to record the event rather than display it. I would use SERTXD, which is not in use.
OK. so record the event to a spare location in EEPROM. If a time out then write a "t" and increment a byte to indicate how many times it has occured. If a data error, write the value of the data byte, etc. You get the idea. Have the code display the error to the LCD upon power up. or whenever you choose.
 

Skiwi

New Member
Is your whole chip resetting to the start?
Put a serial message at the start, before the main loop, to verify that.
I have a logger that re-boots occassionally. If the serial input is not cleanly terminated, as in short, close connections, sometimes electrical noise, maybe from mains spikes through the power supply or RFI etc, can fool the picaxe to thinking there is a programming command coming, and end result is a restart. I suggest some capacitive de-coupling across Serial input, maybe 1n, enough to suppress but not to stop valid re-programming.
Also, I cant tell from your photos and circuit, if you have the reset pin pulled high. Again, it needs a good clean termination to prevent false triggering. Some may question this as the inputs are lowish impedance, but I have had experience with these.
A particular source of noise I have, that upsets the chip, is a pure sine-wave inverter. It may be a pure 50Hz inverter, but it runs at a high frequency to achieve this.
DC/DC converters can also cause RFI noise into boards and cables. You project does have a lot of wiring between units, which can act as an antenna for such RFI.
And, another issue that can make matters worse, is multiple earths (earth loops). These can be quite hard to figure out, but can cause unexpected events, such as re-boots or errors in readings.
All this may not be your problem, but unexplained events sometimes need longshot ideas to diagnose. As I started, put a serial message at the start to diagnose what is happening with the program.
 

saunj

Senior Member
Thank you all for your suggestions. I have done some investigations.
I modified the code to generate a trigger whenever the "bad response" is invoked. By means of a physical connection to an unused input, this triggers a recording highway
which has a marker in it. The trigger also goes to my DSO external trigger with its 2 inputs connected to the command and response of the SD interface.
The results show that the sequence runs without any recording activity at the time. In fact this afternoon the recording went 76 times in just over a minute, and the house was unoccupied.
and the system was running on its backup battery..
The rest of the afternoon there were no other occurrences. Since all recordings are intact, interference to the interface is ruled out.
I am not going to pursue this anymore, since the code is so complex, and it works.
The attachment is sorted from the whole files by the S on the end. There are 2 files: the"E" file on the left always records 6 seconds past the minute.
I have pasted on the right the most recent "V" file which posts at 31 second intervals.
 

Attachments

Top