NANO54415 locks up after a lot of TCP connect()s

Discussion to talk about software related topics only.
User avatar
mx270a
Posts: 80
Joined: Tue Jan 19, 2010 6:55 pm

NANO54415 locks up after a lot of TCP connect()s

Post by mx270a »

I've been working on an app for a NANO54415 and started experiencing lockups. I'm using NNDK v2.6.4. All tasks will stop reporting data to the debug serial port, I lose the ability to ping it, and cannot search for it on the network any more. The link light remains lit, but I need to press the reset button to bring it back online.

I've narrowed it down to something with the network connect() statement. I have checked the available network buffers and they are not being depleted.

For an example, the code below will cause a lockup a few minutes to a few hours after bootup. The length of time between occurrences seems to be random. This app has a task that tries to connect as a HTTP client to the first three IPs on the network, looping continually. On my network, x.x.x.1, .2, and .3 are all devices with HTTP servers, so the connect() statement doesn't block for very long. I also have the main task reporting the free buffers, which stays at 262-263. Any idea what may be causing the device to stop?

Code: Select all

#include "predef.h"
#include <stdio.h>
#include <ctype.h>
#include <startnet.h>
#include <autoupdate.h>
#include <dhcpclient.h>
#include <NetworkDebug.h>
#include <tcp.h>
#include <buffers.h>

extern "C" {
	void UserMain(void * pd);
}

const char * AppName="Test";
DWORD ConnectionsTaskStack[USER_TASK_STK_SIZE];
typedef union {
	unsigned long l;
	unsigned char c[4];
} my_ip_union_t;
my_ip_union_t ip_addr;

void ConnectionsTask(void * pd) {
	int pass = 0;
	ip_addr.l = EthernetIP;
	ip_addr.c[3] = 0; //Start at zero, will increment to 1

	while(1) {
		iprintf("a");
		ip_addr.c[3] += 1; //Increment IP
		if (ip_addr.c[3] == 4) {
			ip_addr.c[3] = 1;
			pass++;
			iprintf("\r\nPass %i\r\n", pass);
		}
		iprintf("\r\nTrying: %i.%i.%i.%i ", (int)ip_addr.c[0], (int)ip_addr.c[1], (int)ip_addr.c[2], (int)ip_addr.c[3]);
		int fd = 0;
		fd = connect(ip_addr.l, 0, 80, TICKS_PER_SECOND * 5); //Block for 5 seconds max
		//Would normally read data here
		close(fd);
		fd = 0;
		iprintf("f");

		OSTimeDly(20);
	}
}

void UserMain(void * pd) {
    InitializeStack();
    OSChangePrio(MAIN_PRIO);
    EnableAutoUpdate();

    #ifdef _DEBUG
    InitializeNetworkGDB();
    OSTimeDly(TICKS_PER_SECOND * 5);
    #endif

    OSTaskCreatewName(ConnectionsTask, (void *)0, &ConnectionsTaskStack[USER_TASK_STK_SIZE], ConnectionsTaskStack, 42, "ConnectionsTask");

    while (1) {
        OSTimeDly(20);
        iprintf("[%i]",  GetFreeCount()); //Check the free buffer count.
    }
}
User avatar
dciliske
Posts: 624
Joined: Mon Feb 06, 2012 9:37 am
Location: San Diego, CA
Contact:

Re: NANO54415 locks up after a lot of TCP connect()s

Post by dciliske »

I'm fairly sure you're running out of sockets. You're right up against the limit of the system's standard implementation. The maximum capability of the system is 132 connects per 120 seconds (Assuming that you don't try to have more than 32 sockets open at a time). You could see about increasing this limit by changing the max number of closed but monitored sockets by changing the size of the short socket list in 'tcp.cpp'. You'll need to change the definition of the SHORT_SOCKET_SIZE macro on line 4534. I'd say, up it to 300 and see if the problem goes away?

-Dan
Dan Ciliske
Project Engineer
Netburner, Inc
User avatar
mx270a
Posts: 80
Joined: Tue Jan 19, 2010 6:55 pm

Re: NANO54415 locks up after a lot of TCP connect()s

Post by mx270a »

Ok, that would make sense. So the system can have 32 open sockets, plus (by default) 100 closed sockets that are finishing up and closing down, thus a total of 132. Is there a function to get the number of free closed sockets? Is there a way to decrease the timeout from 120 seconds to 30 seconds?

Reading through other threads about network connection limits I came across the buffers and the GetFreeCount() function to make sure there are some available. Could you explain what the buffers are and how they relate to sockets?
User avatar
dciliske
Posts: 624
Joined: Mon Feb 06, 2012 9:37 am
Location: San Diego, CA
Contact:

Re: NANO54415 locks up after a lot of TCP connect()s

Post by dciliske »

  1. Is there a function? Not that I'm aware of.
  2. Can the timeout be decreased? No. It's part of the TCP spec.
The best solution, if you really want to be doing that many connections, is increasing the size of the ShortSocketList.

As for how buffers relate to sockets... They're separate entities. Buffers are the structures that the system uses to shuffle data from place to place. Sockets are the pipes used by TCP to refer to a connection and send data between the endpoints. There's also ShortSockets, which our system uses to keep track of previously closed sockets waiting for the 2MSL (Maximum Segment Lifetime) period to expire, so we know that we need to reject all packets on that socket. ShortSockets are simply smaller and take less memory, but have all relevant data for that task.

When you call 'connect' the system has to obtain a socket structure and initialize it according to the TCP connection being established. When you call 'close' on that same socket, the system needs to get a ShortSocket to use for rejecting packets on that connection for the next 120 seconds. So, basically, your system is running through these resources faster than they can be released. Therefore, if you increase the number of ShortSockets available to the system, you should be able alleviate the problem.

-Dan
Dan Ciliske
Project Engineer
Netburner, Inc
User avatar
mx270a
Posts: 80
Joined: Tue Jan 19, 2010 6:55 pm

Re: NANO54415 locks up after a lot of TCP connect()s

Post by mx270a »

I tried taking SHORT_SOCKET_SIZE to 300 and then 1000, but still saw the lockups. I then wrote a function to check the number of free sockets. This lives in my tcp.cpp:

Code: Select all

int NumberOfShortSocketsRemaining() {
	int counter = 0;
	for (int i = 0; i < SHORT_SOCKET_SIZE; i++) {
		if (ShortSockets[i].m_flags == 0) {
			counter++;
		}
	}
	return counter;
}
With the SHORT_SOCKET_SIZE back at 100, I would see the number of available sockets drop to 61 where it would level out. If I started with 500, it would drop to 461 and stay there. However, if I start with just 10, it drops to 1 and keeps running for a while before locking up. Reading through the other code in tcp.cpp, it appears that if no short sockets are free, it just kills the oldest one to make room, ensuring you never run out. Any idea what else could be causing these lockups?

On a somewhat unrelated note, I thought it was odd that it would drop to 1 free short socket, not zero. It seems that ShortSockets[0] is never used. Thus if you allocate 100, you really only get use of 99.
rnixon
Posts: 833
Joined: Thu Apr 24, 2008 3:59 pm

Re: NANO54415 locks up after a lot of TCP connect()s

Post by rnixon »

Good testing and thanks for sharing the function. I don't quite understand the description though. In the beginning you say it locks up with a setting between 300 and 500, but later on it sounds like it does not lock up when set to 100 or 500. Can you please clarify?
User avatar
mx270a
Posts: 80
Joined: Tue Jan 19, 2010 6:55 pm

Re: NANO54415 locks up after a lot of TCP connect()s

Post by mx270a »

I'm getting lockups after approximately the same amount of time regardless of what the ShortSockets size is set at. I've tried 10 to 1000.

When I had the size set at 10, it fills up the ShortSockets and keep going for a while before lockup, leading me to believe that you cannot actually run out of ShortSockets regardless of size. When starting with a large number like 500, it wouldn't get anywhere close to being out when it locks up. I suspect my issue stems from something else.
User avatar
dciliske
Posts: 624
Joined: Mon Feb 06, 2012 9:37 am
Location: San Diego, CA
Contact:

Re: NANO54415 locks up after a lot of TCP connect()s

Post by dciliske »

hmm... Interesting. I did see that issue with ShortSocket[0] on Thursday, but wanted to discuss things with Paul before proposing the trivial fix (no change in TCP is ever trivial...).

I'm in agreement that it doesn't appear to be an issue with depleting the ShortSocketList. Can you run TaskScan on the device during a lockup? Is the code you gave in the original example all that is needed to trigger the failure?

-Dan
Dan Ciliske
Project Engineer
Netburner, Inc
User avatar
mx270a
Posts: 80
Joined: Tue Jan 19, 2010 6:55 pm

Re: NANO54415 locks up after a lot of TCP connect()s

Post by mx270a »

I set my ShortSocket size back to 100 and rebuilt the system files. I then copied the code above into a program, and reduced the OSTimeDly in my ConnectionsTask loop to (1) to reduce my wait time. I also added the lines for task monitor. The result is this:

Code: Select all

#include "predef.h"
#include <stdio.h>
#include <ctype.h>
#include <startnet.h>
#include <autoupdate.h>
#include <dhcpclient.h>
#include <NetworkDebug.h>
#include <tcp.h>
#include <buffers.h>

#include <taskmon.h>

extern "C" {
   void UserMain(void * pd);
}

const char * AppName="Test";
DWORD ConnectionsTaskStack[USER_TASK_STK_SIZE];
typedef union {
   unsigned long l;
   unsigned char c[4];
} my_ip_union_t;
my_ip_union_t ip_addr;

void ConnectionsTask(void * pd) {
   int pass = 0;
   ip_addr.l = EthernetIP;
   ip_addr.c[3] = 0; //Start at zero, will increment to 1

   while(1) {
      iprintf("a");
      ip_addr.c[3] += 1; //Increment IP
      if (ip_addr.c[3] == 4) {
         ip_addr.c[3] = 1;
         pass++;
         iprintf("\r\nPass %i\r\n", pass);
      }
      iprintf("\r\nTrying: %i.%i.%i.%i ", (int)ip_addr.c[0], (int)ip_addr.c[1], (int)ip_addr.c[2], (int)ip_addr.c[3]);
      int fd = 0;
      fd = connect(ip_addr.l, 0, 80, TICKS_PER_SECOND * 5); //Block for 5 seconds max
      //Would normally read data here
      close(fd);
      fd = 0;
      iprintf("f");

      OSTimeDly(1);
   }
}

void UserMain(void * pd) {
    InitializeStack();
    OSChangePrio(MAIN_PRIO);
    EnableAutoUpdate();
    EnableTaskMonitor();

    #ifdef _DEBUG
    InitializeNetworkGDB();
    OSTimeDly(TICKS_PER_SECOND * 5);
    #endif

    OSTaskCreatewName(ConnectionsTask, (void *)0, &ConnectionsTaskStack[USER_TASK_STK_SIZE], ConnectionsTaskStack, 42, "ConnectionsTask");

    while (1) {
        OSTimeDly(20);
        iprintf("[%i]",  GetFreeCount()); //Check the free buffer count.
    }
}
This code will lock up for me. Running it three times, it locked after 10949, 1631, and 10280 passes. Keep in mind that the three IPs it is connecting to all listen and respond on port 80, so it flies through this loop. It will make 6-7 complete passes per second. I can run TaskScan while it is running, but not after it locks up.

I agree with you on the ShortSocket[0] situation - that is a very important section of code, must be very careful not to break anything when working in there trying to recover a few bytes of wasted memory.
User avatar
pbreed
Posts: 1088
Joined: Thu Apr 24, 2008 3:58 pm

Re: NANO54415 locks up after a lot of TCP connect()s

Post by pbreed »

Can you make a very high priority task say 1 that just spits out a printf message once every 10 seconds or so and see if that goes away....


When it hangs ...

Paul
Post Reply