App Header Failed

Discussion to talk about software related topics only.
Post Reply
PaulSteane
Posts: 7
Joined: Fri Apr 16, 2010 3:49 am

App Header Failed

Post by PaulSteane »

What exactly does "App Header Failed" (followed by "No Valid App") mean, when displayed by the Monitor?

Background (details obscured for confidentiality reasons):-

I have a project of some 600 individual units, each contains a MOD5282 module on a custom main processor card. There are two variants with different I/O cards added, I'll call these VAR1 (approx. 400) and VAR2 (approx. 200). They've been running happily for several years.

The application software on each is 85% the same on each variant, only about 15% of the software is different between the two variants.

Each unit can access an FTP server which can contain new application software. To do an application software update we upload new .s19 files into the server then reboot the units, they look for new software at startup and re-program themselves if required. I use the ReadS19ApplicationCodeFromStream to do this. This has been working very well, we've done 3 or 4 software updates over 18 months like this.

For the previous update to the one I'm currently having trouble with, we needed to use to save some user data so we decided to use the UserParameters Flash area. A complication is that we are also already using the SNMP library which also uses this area to store some config data. So I wrote some new functions to wrap both the SNMP data and our own data together. All seemed OK on bench testing so we rolled this version out to all the units. This was over a year ago and used version 2.6.0 of the Netburner Tools and Libraries.

Now we're doing another update to code unrelated to the UserParameters stuff. Now we're at Tools version 2.6.3. Again bench testing seemed OK so we remotely updated about 12 VAR1 units and 6 VAR2 units). The VAR1 units updated OK but all the VAR2 units dropped out to the Monitor after self-reprogramming, displaying "App Header Failed/No Valid App". Note, the units appear to programme OK, reboot then fail. Obviously this requires a site visit to fix, uploading the new code using MTTTY; when we do this, all works OK.

So I'm trying to find out what's going on, what's different between the code for VAR1 and VAR2 (the code lengths are different). Also are there any relevant differences between 2.6.0. and 2.6.3? I suspect a problem around my UserParameters reading/writing code, so I'll be looking in detail here. We do a SaveUserParameters after re-programming is complete, just before a ForceReboot; if this is a problem, why does it work OK for some variants and not others?

Now it may be that we have to visit all 200 rogue units to manually program them to the latest version. However I would like to know what is happening so that we can make sure it doesn't happen again!

Meanwhile I wondered if anyone had any helpful thoughts.
User avatar
dciliske
Posts: 624
Joined: Mon Feb 06, 2012 9:37 am
Location: San Diego, CA
Contact:

Re: App Header Failed

Post by dciliske »

Yea, it sounds like you're corrupting the application image when you're performing that SaveUserParams write.

Looking at the linker file, UserParamBase is located at 0xFFC06000 and FlashAppBase is located at 0xFFC08000 on the Mod5282. Thus if you're modified version of SaveUserParams writes past 0xFFC07FFF (aka if the total write size is greater than 8KB), it's going to overwrite part of the application image. If your different variants have different amounts of config to save, then it could just be that Variant 1 doesn't quite hit that limit.

-Dan
Dan Ciliske
Project Engineer
Netburner, Inc
PaulSteane
Posts: 7
Joined: Fri Apr 16, 2010 3:49 am

Re: App Header Failed

Post by PaulSteane »

I'm only writing 0x36C bytes to the area starting at 0xFFC06000, and it's the same for all variants, so this shouldn't be a problem. I've checked this with debug iprintf statements.

I've been digging down into the library functions to try to see where my problem is coming from, in case something here is getting the erase or write addresses or lengths wrong. I've tried creating my own versions of FlashErase and FlashProgram for tweaking things and adding debug statements (keeping them outside the critical code areas). One problem I'm getting is that when I do this I change the code length and this makes the fault go away!

The only anomaly I've been able to come up with so far is that the algorithms in FlashErase and FlashProgram are not exactly the same as shown in the Flash Module section of the MCF5282 Manual (Figure 6.13).

Possible problems are

1. there is no check for the CBEIF flag being set before erasure/programming is started - this would be fixed by adding a line

Code: Select all

while ( (sim.cfm.cfmusat & 0x80) == 0);
near the start; and

2. there is no check for the PVIOL and ACCERR flags ever getting set, and clearing them if required (though they do get cleared near the start of FlashProgram).

If I make changes to add these, the overall code length changes and the problem goes away anyway. So I don't know if they are significant or not. However experience on a previous project (different processor, different hardware) showed that failing to comply with the manufacturer's algorithm had unexpected consequences.

I've found a workaround that means we don't have to manually visit all 200 units, so now I'm trying to convince myself that it won't happen again - so I'm still trying to get to the bottom of the problem.

Any more thoughts would be welcomed.
User avatar
pbreed
Posts: 1088
Joined: Thu Apr 24, 2008 3:58 pm

Re: App Header Failed

Post by pbreed »

Got to ask a very specific question...
WHEN do you write to userparams?


IE we have seen a lot of issues where people write to userparams on power up and then corrupt the system.

The problem is that if power is at all flakey (IE think plugging in a cord, power may flash on and off a few times before its stable.)
Then the power may fail in hte middle of writing the userparam space... if power goes away while programming flash it often corrupts other flash...

I would wait at least 30 second before writing to user params on boot... (Its fine to read, just not ok to write until your 100% sure power is stable)

Paul
PaulSteane
Posts: 7
Joined: Fri Apr 16, 2010 3:49 am

Re: App Header Failed

Post by PaulSteane »

I write to the UserParams area as follows:-

1. After a code update has occured, immediately after ReadS19ApplicationCodeFromStream has returned STREAM_UP_OK (so this is about a minute after power up, and only if a code update has occured)

2. Soon after power up, if the UserParams content is not what is expected but other conditions are OK, then re-write the UserParams (maybe 15 seconds after power up). However this would normally never occur, unless there is a system fault, or if the UserParams update after a code update process above has failed for some reason

3. If the customer changes certain configuration settings, this is extremely rare and is generally after the system has been powered up for some time.

I'm confident that the power supply is OK, it comes off a big battery via a DC-DC convertor then via a Power over Ethernet switch then via a PoE supply on our hardware. This arrangement is the same for all units, problematic or not.

My suspicion is that the problem occurs due to item 1 above.
Post Reply