Release 2.8.5

Discussion to talk about software related topics only.
SeeCwriter
Posts: 606
Joined: Mon May 12, 2008 10:55 am

Re: Release 2.8.5

Post by SeeCwriter »

I continue to struggle to find the reason for the crashing. After a weeks of running a unit crashed today with an unimplemented op-code error. I had it connected to MTTY, had SmartTraps enabled, but still no useful information was had. The call stack was "00033102, 0". And Add2lines returns nothing. That address seems awfully low. I tried looking in the map file to see what I could find near that address. I found a number of addresses in the 00033xxx range and they all were connected to ...\m54455\libm.a. Is there a clue there?
User avatar
TomNB
Posts: 538
Joined: Tue May 10, 2016 8:22 am

Re: Release 2.8.5

Post by TomNB »

From past experience I would agree eith ecasey's previous comment. A memory error this this kind does not mean the problem will occur near the address of the fault. One path forward would be to add code to check array and string bounds for every location you write to them. Same with pointers. I had asked you if you enabled stack checking back in September, but you did not reply. Have you tried that?
SeeCwriter
Posts: 606
Joined: Mon May 12, 2008 10:55 am

Re: Release 2.8.5

Post by SeeCwriter »

I had not enabled stack checking. I just did so, and received several warning about "Frame size too large for reliable stack checking. Try reducing the number of local variables." Which I find odd since in one case there are only 2 local variables in the function: char buf[20] and int amp.
Is that an issue?
User avatar
TomNB
Posts: 538
Joined: Tue May 10, 2016 8:22 am

Re: Release 2.8.5

Post by TomNB »

Depends on how nested things are. Remember, all function calls for a particular task get put on the same task stack. One what to handle buffers is to use the static keyword so it is global and not on the stack.
SeeCwriter
Posts: 606
Joined: Mon May 12, 2008 10:55 am

Re: Release 2.8.5

Post by SeeCwriter »

I changed the local buffer declaration to put it on the heap, and I still get the "frame too large" warning. This particular function is only called at power-up to perform system initialization. There are quite a few function calls, but only the two local variables.

Code: Select all

char *buf = new char[20];
...

delete [] buf;
SeeCwriter
Posts: 606
Joined: Mon May 12, 2008 10:55 am

Re: Release 2.8.5

Post by SeeCwriter »

According to the GNU manual, the stacking checking option does not add any code to check the stack, it only puts hooks in the code so that the OS can check the stack. Does the OS use these hooks? Will I get additional information in the crash dump if the stack is corrupted?

In Eclipse one of the stack checking options that can be enabled is "stack-usage". But there is nothing in the GNU gcc manual about it. Is that information that gets put in a map file or some other file?

Would increasing the stack frame size be useful?
ecasey
Posts: 164
Joined: Sat Mar 26, 2011 9:34 pm

Re: Release 2.8.5

Post by ecasey »

For stack-usage look for files in the Release directory with a .su extension after you compile.

If the problem is a stack overflow of a small enough amount a larger stack frame size would work. if not, the problem will persist, although it might cause different issues. Check out this article on stack checking https://www.netburner.com/learn/detecti ... ded-system

I see from one of your code examples that you use new and delete for allocating arrays rather than making them global. You probably know the dangers of that and the how to deal with them, however, you might want to try GetFreeCount() to check for memory leaks. See https://forum.embeddedethernet.com/view ... eak#p10019 for more details.

If you use sprintf() to populate arrays, you might consider changing to snprintf() and specifying the maximum characters to write to the array.
SeeCwriter
Posts: 606
Joined: Mon May 12, 2008 10:55 am

Re: Release 2.8.5

Post by SeeCwriter »

I added GetFreeCount() to the main loop, and I can see that the count is slowly dropping. We're talking hours. But some units drop faster than others, even though they all have the same firmware, on the same network, communicating with the same equipment. That seems odd to me.

In any case, in another thread pbreed made this comment:

"UDP is about the only way you can run out of buffers, ie you recieve a bunch of UDP packets and you don't process them,
they just sit in the queue waiting to be processed... "

How can packets just sit there without being processed? Unless the program crashes, why wouldn't each call return the next packet?

Code: Select all

UDPPacket upkt( &udp_rcv_fifo, 0 );
Our equipment sends a UDP broadcast packet of approximately 160-bytes every 500msec. There are anywhere from 20-30 pieces of equipment on the network at the same time. That's 60 packets a second that each module has to process. That doesn't seem overly burdensome.
ecasey
Posts: 164
Joined: Sat Mar 26, 2011 9:34 pm

Re: Release 2.8.5

Post by ecasey »

Does the GetFreeCount() get close to zero at the point of a crash?

You could try releasing and re-initializing the buffer periodically, perhaps once per hour. If the GetFreeCount() is restored to about the same value each time, then you know that all of the UDP packets are not getting processed. If not, then the memory leak is elsewhere.
SeeCwriter
Posts: 606
Joined: Mon May 12, 2008 10:55 am

Re: Release 2.8.5

Post by SeeCwriter »

I'm running tests on several units now to see how low buffer count goes before it crashes. After about 24-hours, 2 units seem to have lost one buffer. When I started, the buffer count would vary from 262 to 261. Now it's 261 to 260. Yesterday, after running for several hours with a buffer count of 262-261, suddenly dropped to 171 and then crashed. Since then it hasn't done it again. Very random.

These buffers appear to only be used when using UDP. If I turn off UDP, the buffer count never moves from 262. And since they are only used by the Netburner libraries, where else would I look for a leak?
Post Reply