Release 2.8.5

Discussion to talk about software related topics only.
User avatar
TomNB
Posts: 538
Joined: Tue May 10, 2016 8:22 am

Re: Release 2.8.5

Post by TomNB »

Hello,

They are not only used by the NetBurner libraries. They are primarily used by your application. For example, if your app is set up to receive UDP data, and as a worse case scenario you never read the data, or you have a logic condition in which you don't read all the data, the buffer count will decrease.
SeeCwriter
Posts: 605
Joined: Mon May 12, 2008 10:55 am

Re: Release 2.8.5

Post by SeeCwriter »

If I understand correctly, the buffers are acquired and freed by the Netburner libraries. And that the only interaction between the buffers and my application is when my app retrieves data from one.
I have a task that does nothing but reads UDP packets and stuffs the data into a fifo. My only contact with these buffers is to retrieve the UPD packet in them and copy the packet data to a software fifo for the main task to process. Then I'm done, and the buffer should be freed. Am I correct?

Code: Select all

void ReadUDPTask(void *unused_parameter)
{
  WORD port;

  RegisterUDPFifo( XICOM_UDP_PORT, &udp_rcv_fifo );	// Listen for UDP system broadcasts.
  EnableMulticast(true);

  while (1)
  {
    UDPPacket upkt( &udp_rcv_fifo, 0 );		// constructor waits for incoming packet.

    if (!upkt.Validate()     ) continue;  // Valid packet, not timeout?
    if ( upkt.GetDataSize()<1) continue;  // Got at least a command byte?

    port = upkt.GetDestinationPort();

    if ( port == XICOM_UDP_PORT || port == MULTICAST_PORT ) 
    {
      StoreUdpPacket(&upkt);
    }
  }
}

s_pkt udp_inbound_packet;  // instead of putting this on the stack.
void StoreUdpPacket(UDPPacket *upkt, char *buf, BOOL check_stp, int len)
{
  OSCriticalSectionObj protect_udp_fifo(udp_fifo_access);		// protected execution until end of scope.

  s_pkt *packet = &udp_inbound_packet;
  char *p; 

  if ( upkt )  // a udp packet received normally over ethernet.
  {
    p           = (char *)upkt->GetDataBuffer();
    len         = upkt->GetDataSize();
    packet->ip  = upkt->GetSourceAddress();
    packet->port= upkt->GetSourcePort();
  }
  else if (!len)
  {
    p	  = buf;
    len = strlen(buf);                          // a udp-style string received from web page.
    packet->ip.SetNull();
    packet->port= 0;
  }
  else p = buf;

  if ( !p || (len<1) )  return; // OOPS!

  len = Limit(len, 0, UDP_STRING_LENGTH-2);    	// Ensure data fits.
  p[len-1]=0;                                   // null terminate, typically overwriting the '\n' at the end.

  if ( check_stp && (p[0] != '0') && FromHex( p[0] ) != SS->system_id ) return;    // not part of our system.

  //iprintf("Pkt To Store: %s\r\n", p );

  // store packet.
  int end = strcspn( p, "\r\n" );  // Clip off trailing \r\n in case we forward to web page which dislikes nonprintable.
  if ( len > end ) len = end;
  p[len] = NUL;
  strcpy( packet->data, p );

  udp_packet_fifo.Kew( packet );	
}

ecasey
Posts: 164
Joined: Sat Mar 26, 2011 9:34 pm

Re: Release 2.8.5

Post by ecasey »

Some observations on the code you provided:

It looks like you have declared and initialized a fifo buffer called
udp_rcv_fifo
at global scope. In the code, you register that buffer to receive data (packets) from the UDP port. Once registered, the buffer will continue to receive packets until unregistered, or the task ends. If packets are received faster than they are processed, the buffer will grow. The UDPPacket(FIFO) function only removes one packet from the buffer for each call, it does not clear the buffer. The time between calls to that function depends on the priority of the task, and everything else that goes on in the program.

That fifo appears to be global, is there anything else posting to it?

You also appear to have a second fifo
udp_packet_fifo
that would suffer from the same issue, if not processed fast enough. If it is processed at a lower priority, it could be the one that is growing.

Hope this helps.
SeeCwriter
Posts: 605
Joined: Mon May 12, 2008 10:55 am

Re: Release 2.8.5

Post by SeeCwriter »

All good questions.

Parameter udp_rcv_fifo is declared as OS_FIFO at global scope, and there is nothing else posting to it. Which begs the question, why is it global? Short answer is, I don't know. I'll have to look into it.

When you say "Once registered, the buffer will continue to receive packets..." and "if packets are received faster than they are process, the buffer will grow", do you mean the OS_FIFO structure is growing because new buffers are being added to it faster than can be removed?

The parameter udp_packet_fifo is a class the implements a simple global static ring buffer. The class consists of an array of 64 4k buffers, along with a head and tail as ints. The buffers are static, no 'new' or malloc being used. So nothing to free. There are 3 tasks that access this fifo. The UPD Read task and the Webpage fill the fifo, and Usermain pulls data out of the fifo and processes it. The fifo is protected with an OSCriticalSectionObj object so only one task at a time can access it. There is a small risk of Usermain not being able to pull data out of the fifo faster than it's coming in. But in that case, the worst that would happen is that the oldest entry would be lost. It wouldn't cause a crash.

What happens when the system buffers that UDPPacket/OS_FIFO are using are used up? I would think that UDP packets would be lost, but that the program would not crash.

So it occurs to me that the sudden loss of buffers could be a symptom and not a cause of the crash/reboot. I'm trying to construct a test that would clarify that, if possible.
ecasey
Posts: 164
Joined: Sat Mar 26, 2011 9:34 pm

Re: Release 2.8.5

Post by ecasey »

do you mean the OS_FIFO structure is growing because new buffers are being added to it faster than can be removed?
Yes, I think the implication of the quote from pbreed that you posted earlier is that the fifo just keeps growing if you don't process it faster than the packets are coming in. I don't think it is a ring buffer.


Your udp_packet_fifo sounds like it is not the problem.
SeeCwriter
Posts: 605
Joined: Mon May 12, 2008 10:55 am

Re: Release 2.8.5

Post by SeeCwriter »

While I was on vacation last week I left 10 units running for my test. Three units rebooted twice after running for 18 days, and one crashed to the alternate boot monitor and didn't reboot. The crashed unit rebooted once before the final crash. The debug output from the reboot is:

-------------------Trap information-----------------------------
Stack is corrupt A7=80ff0000
Trap Vector =
Stack Frame seems corrupt unable to do RTOS dump

-------------------End of Trap Diagnostics----------------------

The output from the crash is:

-------------------Trap information-----------------------------
Exception Frame/A7 =80002bdc
Trap Vector =Debug interupt (12)
Format =04
Status register SR =2010
Fault Status =00
Faulted PC =00000000

-------------------Register information-------------------------
A0=4035d67c A1=401176f8 A2=000000a2 A3=000000a3
A4=000000a4 A5=000000a5 A6=00000000 A7=80002bdc
D0=0000002e D1=00000078 D2=80002bf0 D3=00000006
D4=0000000b D5=000000d5 D6=000000d6 D7=000000d7
SR=ffff PC=00000000
-------------------RTOS information-----------------------------
SR indicates trap from within ISR or CRITICAL RTOS section
The OSTCBCur current task control block = 80000a24
This looks like a valid TCB
The current running task is: Main#32
-------------------Task information-----------------------------
Task | State |Wait| Call Stack
Idle#3f|Ready | |400692de,40068f98,0
Main#32|Running | |00000000,0
TCPD#28|Semaphore |0343|40069c86,400765aa,40068f98,0
IP#27|Fifo |0005|4006a07c,4006c902,40068f98,0
Enet#26|Fifo |0028|4006a07c,4006052c,40068f98,0
HTTP#2d|Semaphore |000e|40069c86,40079f0a,400783cc,40068f98,0
User,#2f|Fifo |FRVR|4006a07c,4006f13e,4008836e,40068f98,0
User,#30|Fifo |FRVR|4006a07c,4006f13e,40039872,40068f98,0
FTPD#2e|Semaphore |0003|40069c86,40079f0a,40072338,40068f98,0
User,#31|Timer |115b|400696a6,4004e59e,40068f98,0

-------------------End of Trap Diagnostics----------------------

I tried increasing the stack size to 4096 but I get linker errors about violating an SRAM segment.

I don't see anything in the dump data that would help me locate where the error occurs. Does anyone else?

Note, the firmware has optimization disabled and stack checking enabled.

One of the units has firmware with the number of buffers increased to 1024. That unit has been running for 10 days without any issues.
User avatar
TomNB
Posts: 538
Joined: Tue May 10, 2016 8:22 am

Re: Release 2.8.5

Post by TomNB »

Which stack exactly?
SeeCwriter
Posts: 605
Joined: Mon May 12, 2008 10:55 am

Re: Release 2.8.5

Post by SeeCwriter »

In constants.h there are defines for stack sizes. We aren't using SSH or SSL so the various task stacks are set to 2048. I changed them 4096 and rebuilt the system. Everything compiled but the linker threw an error about exceeding some segment. I just restored everything back to original.
User avatar
TomNB
Posts: 538
Joined: Tue May 10, 2016 8:22 am

Re: Release 2.8.5

Post by TomNB »

You need to change the corresponding FAST stack defines to match which stacks will on be in SRAM:

// Uncommented system tasks will be stored in SRAM, otherwise SDRAM will be used.
//#define FAST_IDLE_STACK
#define FAST_MAIN_STACK
#define FAST_ETHERNET_VARIABLES
#define FAST_ETHERNET_STACK
#define FAST_BUFFERS_VARIABLES
#define FAST_BUFFERS
#define FAST_IP_VARIABLES
#define FAST_IP_STACK
#define FAST_TCP_VARIABLES
#define FAST_TCP_STACK

Moving from SRAM to SDRAM will impact performance, but should be fine for a test. The trap information looks like your code is corrupting the stack, so this is a good test.
SeeCwriter
Posts: 605
Joined: Mon May 12, 2008 10:55 am

Re: Release 2.8.5

Post by SeeCwriter »

In my troubleshooting to find the cause of the reboots, on the recommendation of a co-worker, I recompiled with v2.7.6. The program is running. Now I need to wait and see if it will reboot.

In the mean time, I noticed some strange values. I have a structure that is 1192-bytes in size. What I'm seeing is that, integer values that are read via the webpage are backwards when compiled with 2.7.6 and correct when compiled with 2.8.5.

For example, an integer with a value of 300 (0x0000012C), reading that value with 2.7.6 compiled code results in a value of 19660800 (0x12C0000), whereas when using 2.8.5 the value read is 300. In another case, an integer is set to 4095 (0x00000FFF), and it comes out as 0x0FFF0000 with 2.7.6.

This looks like a Big-Endian/Little-Endian conflict (Motorola vs Intel). Not sure how to correct this.
Post Reply