TCP v2.4.2 and 2.7.7 differences (MOD5272)

DBrunermer · Post by **DBrunermer** » Sat Nov 10, 2018 11:14 am

Hi,

I'm having a problem with some code that I run on a MOD5272. It has to do with receiving data on the ethernet port. When I compile the software in v2.4.2, it executes properly. When I compile the same code in v2.7.7, it doesn't handle incoming data properly.

After extensive testing with WireShark, I think I see what's going on, but I don't know why or how to change it. It has to do with how the port is opened. For this data transmission, it used port 26 and the NB IP is fixed at 192.168.1.101.

In the code compiled with the older version (2.4.2), when the port is opened by the PC I see four messages:
PC->NB.26 [SYN] Seq=0 WIN=64240 ...
NB.26->PC [SYN, ACK] Seq=0 Ack=1 WIN=0 ...
PC->NB.26 [ACK] Seq=1 Ack=1 WIN=64240 ...
NB26->PC [TCP Window Update] [ACK] Seq=1 Ack=1 WIN=4644 ...

In the code compiled with the newer version, I only see three messages when the port is opened:
PC->NB.26 [SYN] Seq=0 WIN=64240 ...
NB.26->PC [SYN, ACK] Seq=0 Ack=1 WIN=16422 ...
PC->NB.26 [ACK] Seq=1 Ack=1 WIN=64240 ...

From there, the messages in WireShark diverge so much, they're impossible to compare. I'm not much of an expert in this, but based on what I'm seeing, I'm gathering that the WIN parameter is used by the TCP flow control to know when to throttle back the transmission rate. And when the window is set really small, it seems like the NB always keeps up and the PC keeps itself in check. When it's initialized with the higher value, it seems like it's almost immediately being overrun and I see a bunch of [TCP DUP ACK] and [TCP FAST RETRANSMISSION].

So the data isn't being recorded properly. Is there any way to change the way a port is opened so it starts with a smaller WIN parameter?

Thanks, Dan

DBrunermer · Post by **DBrunermer** » Mon Nov 12, 2018 7:35 pm

I've been trying to dig a little deeper into the code, specifically tcp.cpp for both versions. OK, they're so different as to be nearly incomparable. I have no idea what all the SACK stuff is doing, or what it's about. I would expect that when everything's running well none of that code ever executes.

Anyway, down at the end of the function, I saw one difference that leapt out at me. From the old version:

Code: Select all

   if ( state < TCP_STATE_ESTABLISHED )
   {
      pTcp->window = 0;
   }
   else
   {
      pTcp->window = MyAdvertisedWindow();
   }

From the new version(2.7.7):

Code: Select all

      pTcp->window = MyAdvertisedWindow();

And there is no check for the state. There is a define in the tcp.h file

Code: Select all

#define TCP_STATE_WAIT_FOR_ACCEPT   (4)   /* TCP_STATE_ESTABLISHED window = 0 */

But I'm wondering if that state isn't actually happening, and it's just skipping from 3 to 5. That still wouldn't explain why the MyAdv..() function is returning such a large number. Can that be limited somehow?

Thanks, Dan

Post by **TomNB** » Tue Nov 13, 2018 12:50 pm

Hello,

It looks like you are doing some pretty deep comparisons, but I might suggest a different approach. This same stack is running on all our platforms, and there are no other reports of losing data. It should not matter what the window size is, it is there for flow control and your should not be losing data with TCP. Your statement: "So the data isn't being recorded properly." isn't quite clear, probably something to do with your app storing data?. One place to look would be in your app code for anywhere a timeout could occur on reading data.

Yyour prior release is from 9 years ago, so there are certainly changes. One way to verify proper operation is to run something like one of our tcp server examples, send it lots of data from a client, and verify you see all the data. If you are sending faster than can be processed, you will see a window size go to 0 until it catches up.

DBrunermer · Post by **DBrunermer** » Tue Nov 13, 2018 6:20 pm

Losing data isn't exactly what's happening, and that's not how I would characterize it. There's almost certainly a better way to do what I'm doing that probably involves a semaphore or something else I don't really know how to use. Maybe you can give me a better idea.

Basically, the box processes commands on port 23, and receives data directly into memory on port 26. There's a big huge memory block I allocate, 6MB of the 8 available. The PC can issue a command on port 23 that tells the box where to record the data that comes in on 26. Each of these ports listens on its own task. And let's say for the sake of argument that when everything starts, the *writeAddr = &b_Memory[0].

OK, so here's where the problem comes in. The P26 tasks listens(INADDR_ANY, ListenPort, 5) and falls into the while(1) loop, accepts, and it does a read using the aforementioned pointer. The pointer itself, as you know, can't be changed once it's in the blocking read, so port 26 has to be closed to change the value. So it's the PC's job to make sure it closes the port before issuing the command to change the pointer.

Like I said in my post, I don't think anything gets dropped. But what's happening is there are some retries going on that I don't see in the older version. And so the PC issues the command to close the port, but it can't seem to tell that the close message hasn't been delivered yet. So then it issues the command on 23 to move the pointer to a new destination, but the command processor knows 26 is open so refuses the request, and the PC errors out. We've tried inserting delays here and there, and that isn't clearing up the problem.

I've been limiting the reads to 1024 byte chunks, and maybe I should rethink that. I could try run more full open on the read function. But is there no way to change that parameter?

Thanks, Dan

DBrunermer · Post by **DBrunermer** » Tue Nov 13, 2018 7:21 pm

OK, say I did a quick calculation like int max_size = 6291456 - ((int)writeAddr - (int)&b_Memory[0]) and passed that in as the value for nbytes? Would it just never come back? Or would it still dump when the PC closes the port? And if it came back that way, would I still get a positive value for the # of bytes returned or would I just get the error and it would act like nothing was transferred? Maybe I should use the timeout if I try that.

sulliwk06 · Post by **sulliwk06** » Wed Nov 14, 2018 5:54 am

If I understand correctly, it sounds like you're just having difficulties prioritizing events on these connections. If I were writing an app like you described, I think I would use just one task instead of two. The task would run a while loop around a select call. The select call would check both ports connections and inside there you can prioritize looking for errors on port 26 to close it before moving on to look for new incoming commands. That way your pointer should be free by the time your command tries to move it. That may be more of a rewrite than you want to do, but it might give you some ideas.

Post by **TomNB** » Wed Nov 14, 2018 1:50 pm

For both read and writes you always need to check the return value to see how many bytes were read/written. You don't say exactly which functions you use, but the number of bytes parameter is usually a maximum, not a block until that number is read.

DBrunermer · Post by **DBrunermer** » Wed Nov 14, 2018 6:54 pm

Sulli, I agree that prioritization is part of the problem. Really, I feel like the PC simply isn't listening to the flow controls. Let me just share some of the code, and you'll see exactly what I mean, Tom.

Code: Select all

// On Port 23, commands are received as ASCII text, in this case something like "RM,0<cr>" and g_Params[0] = 0
#define IMAGE_SIZE 6291456
unsigned char g_Memory[IMAGE_SIZE];
volatile unsigned char * g_pCurWriteIdx = &g_Memory[0];

	case RCV_AT_MEM: // Start writing to an absolute position in the memory block
		if( g_Port26Open )
		{
			siprintf( &TXBuffer[0], "ERROR_PORT26_OPEN\r\n");
			break;
		}
		if( PARAM1_RCD )
		{
			g_pCurWriteIdx = &g_Memory[0];
			g_pCurWriteIdx += g_Params[0];
		}
		<Format a return string echoing the command was received and processed>
		break;

// This is the task, which I pretty much copied from an example. I've removed the standard comments for brevity
int g_ifdnet;
IPADDR g_clientaddr;
WORD g_clientport;
asm( " .align 4 " );
void TcpImageRcvTask( void * pd)
{
    	int ListenPort = (int) pd;
	int fdListen = listen(INADDR_ANY, ListenPort, 5);
	g_ifdnet = 0;
	g_clientaddr = 0;
	g_clientport = 0;
	if (fdListen > 0)
	{
		while(1)
		{
			g_ifdnet = accept(fdListen, &g_clientaddr, &g_clientport, 0);
			g_Port26Open = true;
			int i1 = IMAGE_SIZE - ((int)g_pCurWriteIdx - (int)&g_Memory[0]); // Amount of heap left
			g_BankLineCount = 0 ; // Use as byte counter until the end
			while (g_ifdnet > 0)
			{
		                int n = 0;
		                do
				{
		                	n = read( g_ifdnet, (char *)g_pCurWriteIdx, i1 );	// Read Abs Max
                			if( n>0 )
                			{
                				g_pCurWriteIdx += n;
                				g_BankLineCount += n;
                				i1 -= n;
                			}
                			else // The port is closing
                			{
                				g_BankLineCount = g_BankLineCount / 32;
                			}
				} while ( n>0 );
		                if (g_ifdnet > 0)
                			close(g_ifdnet);
				g_Port26Open = false;
				g_ifdnet = 0;
			} // g_ifdnet > 0
		} // while(1)
	} // if listen
}

I'm well aware of the evils of global variables, but it is what it is. The port 23 handler is structurally similar, but it of course translates the ASCII commands into functions and parameters, and it calls this function, ProcessCommand() that has a big case statement and one of those commands is that "RM,#<cr>" (for example).

The PC does the following
1) Close Port 26
2) Issue RM command
3) Open Port 26
4) Transmit data
<Repeat for all data blocks, generally 50-250kB in length each>

When I compile this code in 2.4.2, everything works fine. For whatever reason, the port opens like I described before, and all of the messages happen in the correct order and the timing is such that nothing overruns.

When I compile in 2.7.7, the behavior is completely different. Basically, the PC blasts a little less than the 16422, and then I start to see some ACKs from the NB, and like I say, a parade of [TCP DUP ACK 21#1] [...21#2] and like 7 or 8 of them. And it does that for more than one ACK. And there's retransmits and other bad things. And all this noise will be going on throughout the transmission, and an RM command will be buried in there, but the close port command doesn't come for several message packets later. So it errors.

I'm trying to work with the UI developers to see if they can do something on their end. It seems like there should be some way to determine whether or not the PC has successfully finished transmitting the data.

Anyway, does this give you any more insight? Thanks, Dan

NetBurner Community Forum

TCP v2.4.2 and 2.7.7 differences (MOD5272)

TCP v2.4.2 and 2.7.7 differences (MOD5272)

Re: TCP v2.4.2 and 2.7.7 differences (MOD5272)

Re: TCP v2.4.2 and 2.7.7 differences (MOD5272)

Re: TCP v2.4.2 and 2.7.7 differences (MOD5272)

Re: TCP v2.4.2 and 2.7.7 differences (MOD5272)

Re: TCP v2.4.2 and 2.7.7 differences (MOD5272)

Re: TCP v2.4.2 and 2.7.7 differences (MOD5272)

Re: TCP v2.4.2 and 2.7.7 differences (MOD5272)