UARTs

Discussion to talk about hardware related topics only.
Post Reply
bbracken
Posts: 54
Joined: Mon Jun 23, 2008 11:21 am
Location: Southern California
Contact:

UARTs

Post by bbracken »

I have an application where the NB is a go-between device with 2 serial ports. One serial port is continuously polled. The second port periodically sends messages which may be responded to immediately or which are passed through to the first serial port and then response is transmitted back to the second port. Every once in a while, every 10 million or so transmit and receives, the NB stops servicing the first serial port. That's correct... every 10 millon or so transactions. I know... that's a pretty low error rate. The problem is that when the system fails it costs the client tens of thousands of dollars to restart the process the NB is involved in.

It appears unlikely that it is the firmware. I'm suspecting some sort of hardware issue (SB70). Possibly power or EMI of some sort. Of course, the problem cannot be reproduced at the factory. It runs forever without any failures. Was wondering if anyone is aware of any 5270 errata which might explain this or have had a similar issue with RS232 before?

Thanks in advance,
Bill Bracken
User avatar
pbreed
Posts: 1087
Joined: Thu Apr 24, 2008 3:58 pm

Re: UARTs

Post by pbreed »

Any user written interrupt routines?

Does the system reboot when it fails?

Does the other serial port continue to work?

How were the serial ports opened?

Is Smarttrap turned on and monitored?
User avatar
Chris Ruff
Posts: 222
Joined: Thu Apr 24, 2008 4:09 pm
Location: topsail island, nc
Contact:

Re: UARTs

Post by Chris Ruff »

It is definitely sounds like a glitch, but when you mention > 10M before failure I would wonder...

Does it ever fail when less than tens of millions of transactions have occured?
If the software is truly air-tight then the EMI-effected device should fail randomly with no message count consideration.

When you test the design in the lab do you provide a messaging environment that is exactly like the messages the failing unit sees?

To hear your description it sounds like your code is message sensitive. One message is ping-ponged, another message is retransmitted, etc. So if just the right sequence of messages occurs the NB might be doing two things at once in a race condition, possibly multi-thread sensitive, part of the code.

I have spent much time chasing difficult conditions like this before. I check that the interrupt variables are properly decoupled/ handled in the tasks. Then insure that the task variables are properly handled by other tasks, then I check that any two task functions running at the same time can't create deadlock, perform improper operations (say- on a uart), etc.
It is painstaking work, but if your code fails more based on uptime and less based on random EMI events, it is probably in your code.

Chris
Real Programmers don't comment their code. If it was hard to write, it should be hard to understand
Post Reply