Page 1 of 1

Mod5282, ehternet hung in data packet wait mode.

Posted: Wed Jun 09, 2010 1:24 pm
by CalvinF
Help Help. I have a rookie ehternet problem here and I need help.

Due to a TCP IP timeout error on the PC, the PC cannot access the data on any of the netburners(32 in all) in the system. A Processor "Reset" is not allowed.

The netburners are stuck in a loop using this section of code:
----------------------------------------
while (1)
{
//other stuff
fdnet = accept(fdListen, &client_addr, &port, 1);
while (fdnet > 0)
{ ------------------stuck in this loop-------------
//other stuff
Timeout = 15;
TCP_IP_size = ReadWithTimeout(fdnet, RXBuffer, RX_BUFSIZE,Timeout);
if (TCP_IP_size < 0)
{
iprintf("Read FDNET exiting \r\n");
break;
}
//other stuff
}
}
---------------------------------------------------------------

They are stuck in this loop, because they never see "TCP_IP_size" go negative. I think the reason is that the PC Application our boys developed, just exited on a timeout error, and thus never closed the TCP/IP on any of these other netburners. We have like 32 netburners running like this, all running on the other side of one of 2 HUBs, one hub connects to two 16 port hubs.

Essentially we have delivered to our customers a whole bunch of systems, using the Mod5282 module.

All the netburner systems are in the Data Wait state on the ehternet, due to the PC Software exiting the application on a timeout from polling one of the systems. Our customer will not allow us to reset these netburners, since they are all running tests and have been for days. They are designed to run independantly of the PC just in case the PC goes offline.

1. Anyone know how to get this FEC to reset on a hardware ethernet reconnect, without resetting the MOD5282 processor?
a. Cannot access the RS232 port. (The thing resets when the chassis lid is loosened).
b. Can only access the RJ45 ethernet cable.
2. Anyone know of any low level software routines that will allow me to Send TCP IP packets without doing the initial Protocol authentication that takes place at the beginning of a connect?

Here is a solution I am thinking about. Take another Netburner, put it in the Data Packet receive mode (like the stuck one above), disconnect the Ethernet cable, and connect to one of the other stuck netburner modules, (crossover cable). I now have to netburners in the data receive mode, then I can take this spare Netburner and generate a disconnect message.

Anyone think this might work?

Thanks.

Re: Mod5282, ehternet hung in data packet wait mode.

Posted: Wed Jun 09, 2010 5:22 pm
by lgitlitz
Now this is a really tough one. So it looks like you are using ReadWithTimeout, you check for an error but not for a timeout. What happens to your code when TCP_IP_size = 0 because of a timeout? Does it just loop back around and try to read again? If so the TCP socket will never close on its own. I think you already understand this issue... just in case other people on the group are confused, the problem is that the TCP protocol specification has no forced rules to detect closed sockets. If you have a socket open that is only receiving data it cannot detect a broken socket. Once you get out of this trouble I highly recommend you add the keep-alive functionality to these sockets on the NetBurner side.

Trying to jump back into this socket to close it sounds like your only possible solution. The first thing I would do here is to send a few TCP reset packets to the socket. If the NetBurner receives a TCP reset packet it will properly close the socket. Your ReadWithTimeout will then return < 0 and you will exit that loop. Now I have not tried to manually send reset packets but it looks like you can just use the "void SendTcpRst( PIPPKT pIp )" function from TCP.cpp. The PIPPKT input parameter is a tricky one so here is a code walk-through to help you out. None of this code is tested, definitely use at your own risk and it may contain some typos.

You will have to make a bogus IP packet to hold your socket parameters:
PoolPtr BogusPacketPP = GetBuffer(); // get a buffer
if ( BogusPacketPP == NULL ) error; // check for error

Then define the IP packet and set the known/required parameters at the IP level:
PIPPKT pBogusPacketIP = GetIpPkt( BogusPacketPP ); // pointer to the IP struct location
pBogusPacketIP->bVerHdrLen = 0x45; // standard header size
pBogusPacketIP->ipSrc = AsciiToIp("0.0.0.0"); // !!!----> THIS ADDRESS SHOULD BE THE IP OF THE PC <---!!!!!
pBogusPacketIP->ipSrc = AsciiToIp("0.0.0.0"); // !!!----> THIS ADDRESS SHOULD BE THE IP OF THE PC <---!!!!!

Now point to the TCP struct and add the known parameters to that level:
PTCPPKT pBogusPacketTCP = GetTcpPkt( p ); // pointer to the TCP struct location
pBogusPacketTCP->flags &= ~TCP_ACK; // clear the TCP ack bit
pBogusPacketTCP->SeqNumber = 0; // set the sequence # to 0
pBogusPacketTCP->dstPort = 0x1234; // !!!----> THIS SHOULD BE THE PORT # OF THE NETBURNER <---!!!!!
pBogusPacketTCP->srcPort = 0x5678; // !!!----> THIS SHOULD BE THE PORT # OF THE PC, likely unknown.... :( <---!!!!!

So I think you may have another big hurdle here. You have no idea what the socket is. A socket is defined by 4 things, local/ remote IP address and local/remote port #. You should know the IP or the NetBurner and the PC. Hopefully you know the port of the NetBurner. Now you probably have no idea what port the PC used to make the connection. Port numbers are 16-bit so how ever you go about jumping in the socket and closing it you will have to loop it about 65000 times to cover all possible sockets for the PC side of the connection.

So your end code should be something like the following:
for( int PC_Socket = 0; PC_Socket < 0x10000; PC_Socket++ )
{
pBogusPacketTCP->srcPort = (WORD)PC_Socket; // insert a new socket number
SendTcpRst( pBogusPacketIP ); // send reset packet
if( PC_Socket % 10 == 0 ) (do some quick polling delay, OSTimeDly might make this last too long ) // If we have sent 10 packets then wait 2 or 3ms or so we know the receiver can process them all, otherwise there is a chance packets will be dropped
}

Make sure that the NetBurner has the same IP address configured as the PC did when it initialited the socket or this will not work. Definitely test this on another NetBurner first as there is always a chance this can crash the receiving NetBurner. Let us know how this works out.

-Larry

Re: Mod5282, ehternet hung in data packet wait mode.

Posted: Thu Jun 10, 2010 7:58 am
by CalvinF
lgitlitz wrote:Now this is a really tough one. So it looks like you are using ReadWithTimeout, you check for an error but not for a timeout. What happens to your code when TCP_IP_size = 0 because of a timeout? Does it just loop back around and try to read again? If so the TCP socket will never close on its own.
Larry, thanks for your response. First I tried my feeble minded aproach, above and it did not work. So I reckon I will give your methods a try.

About your question above. Yes when TCP_IP_size = 0 I just loop back around and try to read again. Happens a lot.
lgitlitz wrote: You have no idea what the socket is.
I have no idea about a lot of things going on here.
lgitlitz wrote: A socket is defined by 4 things, local/ remote IP address and local/remote port #. You should know the IP or the NetBurner and the PC. Hopefully you know the port of the NetBurner. Now you probably have no idea what port the PC used to make the connection. Port numbers are 16-bit so how ever you go about jumping in the socket and closing it you will have to loop it about 65000 times to cover all possible sockets for the PC side of the connection.
Yes I know:
1. IP ADDRESS is known.
2. Port is known.
3. I will increment through various values of PC PORT numbers.

Once again thanks for your help, I will give your suggestions a try.

Re: Mod5282, ehternet hung in data packet wait mode.

Posted: Thu Jun 10, 2010 10:24 am
by pbreed
When a TCP connection is not sending data, and the other side just disappears, there
is no way for TCP to know if the other side died is is just quiet.

One ALWAYS needs a way to clean up zombie sockets.
IE remember the last time you saw data and if you have not seen data for XX seconds
close the connection.

Or if you expect a long idle time use the keep alive features in the new keep alive example.

If your socket is transmitting then it will eventually time out on retransmit attempts and clean itself up that way.



Paul

Re: Mod5282, ehternet hung in data packet wait mode.

Posted: Thu Jun 10, 2010 10:43 am
by lgitlitz
In the code I posted I put the following line twice:
pBogusPacketIP->ipSrc = AsciiToIp("0.0.0.0"); // !!!----> THIS ADDRESS SHOULD BE THE IP OF THE PC <---!!!!!
One of those should actually be the NetBurner IP and should look like this:
pBogusPacketIP->ipDst = AsciiToIp("0.0.0.0"); // !!!----> THIS ADDRESS SHOULD BE THE IP OF THE NetBurner<---!!!!!

Also in the future there are a few very simple ways to avoid this. First would be to simply close the socket when the reaswithtimeout = 0. This may interfere with your transmission of the data. Another very simple way to avoid this problem would be to put the accept from the listener socket in the "while (fdnet > 0)" loop. Then if you get another connection coming into the listener you can close the current open port and connect to the new client. Keep alive is probably the best method since it will not alter how the current functionality of your connections work. There is a bit more code needed for keep-alive, follow the example (...\nburn\examples\TCP\TCP_simple_keepalive)

Re: Mod5282, ehternet hung in data packet wait mode.

Posted: Thu Jun 10, 2010 1:56 pm
by CalvinF
lgitlitz wrote:In the code I posted I put the following line twice
Yup thanks for that.

It appears that so far this method isn't working yet. I get it to increment all the way through the 65535 possible socket #'s, but I never see the destination stuck mod5282 come out of it.

I do have a question about your example:

You listed this:
"Now point to the TCP struct and add the known parameters to that level:
PTCPPKT pBogusPacketTCP = GetTcpPkt( p ); // pointer to the TCP struct location"


So I assumed this:
PoolPtr BogusPacketPP = GetBuffer(); // get a buffer
PoolPtr p = BogusPacketPP;//--------I added this line!!!!!!!---


My question has to do with your "(p) pointer. I assumed it should point to the same BogusPacketPP - yes???

Thanks again for your help.

Re: Mod5282, ehternet hung in data packet wait mode.

Posted: Fri Jun 11, 2010 2:36 pm
by lgitlitz
That was a typo on my part. I think your code should work or you can just use the BogusPacketPP pointer instead of p. You should sniff the packets using wireshark on a PC. You may need to have a hub instead of a switch so your PC can see the packet. This will show you if the reset packets are being sent and if they are incrementing the src port. It will also show you if any response at all is coming from the locked up Netburner. If I get some spare time later I will try to test this myself.

Re: Mod5282, ehternet hung in data packet wait mode.

Posted: Mon Jun 14, 2010 7:05 am
by CalvinF
lgitlitz wrote:That was a typo on my part. I think your code should work or you can just use the BogusPacketPP pointer instead of p. You should sniff the packets using wireshark on a PC. You may need to have a hub instead of a switch so your PC can see the packet. This will show you if the reset packets are being sent and if they are incrementing the src port. It will also show you if any response at all is coming from the locked up Netburner. If I get some spare time later I will try to test this myself.

Thanks for your help.

I still don't have the distant Mod5282 coming out of it.

I can't get to it today. My company needs me on something else.

I will try again in a couple of days.