Release 2.8.5
Posted: Fri Sep 29, 2017 10:29 am
This post is mainly informational and to see if anyone else has experienced something similar with v2.8.5 of the NNDK. I know it's early, but surely others are using this version.
About June 6 I upgraded the development suite from v2.8.3 to v2.8.4. v2.8.4 was found to have a semiphore issue which was fixed with v2.8.5, which I installed in early August. About 2-weeks ago during system testing, we noticed that the MOD5441X modules would randomly crash and reboot about once a day. Sometimes it would take 2-days or more, but for most, at least once within a 24-hour period. The crash occurred on all units (we had 4 running in the same system), at random times. Most would reboot, as is typical of the modules when they crash, but at least one would always crash into the alternate boot monitor and stay there until it was power-cycled.
My initial reaction was that I introduced a bug into my application. That has not been ruled out yet. So I went through the changes I made, several times, and was unable to find anything obvious. It's always "not obvious" until you discover the bug. I used the WinAddr2line utility to try to determine where in the code it was crashing. Every crash was in a different location, but same thread (Main), and was one of three different errors: Access error, invalid opcode, and unknown opcode. But only one program counter value returned a line of code, which was in file __strtod__. Which I'm guessing is being called from a printf statement somewhere. The rest returned "??".
3-days ago I recompiled my application with v2.8.3 and installed it on 9-units. So far not a single unit has crashed or rebooted. I am not declaring victory yet. We are going to continue to run these units into next week, and to install the firmware into additional units for more testing.
I'm willing to concede that the problem is my code, and that perhaps the different compiler version has just changed the program layout enough to avoid a crash.
I am open to suggestions as to how to trouble this.
About June 6 I upgraded the development suite from v2.8.3 to v2.8.4. v2.8.4 was found to have a semiphore issue which was fixed with v2.8.5, which I installed in early August. About 2-weeks ago during system testing, we noticed that the MOD5441X modules would randomly crash and reboot about once a day. Sometimes it would take 2-days or more, but for most, at least once within a 24-hour period. The crash occurred on all units (we had 4 running in the same system), at random times. Most would reboot, as is typical of the modules when they crash, but at least one would always crash into the alternate boot monitor and stay there until it was power-cycled.
My initial reaction was that I introduced a bug into my application. That has not been ruled out yet. So I went through the changes I made, several times, and was unable to find anything obvious. It's always "not obvious" until you discover the bug. I used the WinAddr2line utility to try to determine where in the code it was crashing. Every crash was in a different location, but same thread (Main), and was one of three different errors: Access error, invalid opcode, and unknown opcode. But only one program counter value returned a line of code, which was in file __strtod__. Which I'm guessing is being called from a printf statement somewhere. The rest returned "??".
3-days ago I recompiled my application with v2.8.3 and installed it on 9-units. So far not a single unit has crashed or rebooted. I am not declaring victory yet. We are going to continue to run these units into next week, and to install the firmware into additional units for more testing.
I'm willing to concede that the problem is my code, and that perhaps the different compiler version has just changed the program layout enough to avoid a crash.
I am open to suggestions as to how to trouble this.