Crash implementing dual-stack

Discussion to talk about software related topics only.
sulliwk06
Posts: 118
Joined: Tue Sep 17, 2013 7:14 am

Re: Crash implementing dual-stack

Post by sulliwk06 »

Are you certain your idle timer pointer is being initialized?

Code: Select all

idle = new c_interval(300000); // 300,000 msec = 5 minutes
There was one example you posted where you said it broke and there weren't even any references to the IPADDR object there. If the idle pointer wasn't valid, I could see it behaving the way you described.
SeeCwriter
Posts: 606
Joined: Mon May 12, 2008 10:55 am

Re: Crash implementing dual-stack

Post by SeeCwriter »

That's a reasonable question. Since idle is initialized in the constructor, how could it not be initialized? Wouldn't that also suggest a compiler issue? And why would moving the declaration before the IPADDR declaration affect whether it's initialized or not?
But, it's worth verifying.
I modified the constructor to add a printf statement in the constructor as follows:

Code: Select all

  protocol_link():port(-1),count(0),address('A'),printable(false),check_address(false),efd_protocol(false),
                  udp_redirect(false),efd_addr(1),step(PACKET_DEFAULT_STATE),running(false)
    { 
    idle = new c_interval(300000); // 300,000 msec = 5 minutes
    rbuf[0] = 0;
    tbuf[0] = 0;
    printf( "IP: 0x%lx, idle: 0x%lx\r\n", (DWORD)(&ip_addr), ( DWORD ) idle );
    }
This is the output with idle declared after IPADDR, where the program crashes:

Code: Select all

IP: 0x403614c4, idle: 0x403c95e8
IP: 0x40363430, idle: 0x403c9600
IP: 0x4036539c, idle: 0x403c9610
IP: 0x40367308, idle: 0x403c9628
IP: 0x40369274, idle: 0x403c9638
IP: 0x4036b1e0, idle: 0x403c9650
IP: 0x4036d14c, idle: 0x403c9660
IP: 0x4036f0b8, idle: 0x403c9678
IP: 0x40371024, idle: 0x403c9688
IP: 0x40372f90, idle: 0x403c96a0
IP: 0x40374efc, idle: 0x403c96b0
IP: 0x40376e68, idle: 0x403c96c8
IP: 0x40378dd4, idle: 0x403c96d8
IP: 0x4037ad40, idle: 0x403c96f0
IP: 0x4037ccac, idle: 0x403c9700
IP: 0x4037ec18, idle: 0x403c9718
IP: 0x40380b84, idle: 0x403c9728
IP: 0x40382af0, idle: 0x403c9740
IP: 0x40384a5c, idle: 0x403c9750
IP: 0x403869c8, idle: 0x403c9768
IP: 0x4031a1f2, idle: 0x403c9988
And this is the output with idle declared before IPADDR, and the program runs:

Code: Select all

IP: 0x403614c9, idle: 0x403c95e8
IP: 0x40363435, idle: 0x403c9600
IP: 0x403653a1, idle: 0x403c9610
IP: 0x4036730d, idle: 0x403c9628
IP: 0x40369279, idle: 0x403c9638
IP: 0x4036b1e5, idle: 0x403c9650
IP: 0x4036d151, idle: 0x403c9660
IP: 0x4036f0bd, idle: 0x403c9678
IP: 0x40371029, idle: 0x403c9688
IP: 0x40372f95, idle: 0x403c96a0
IP: 0x40374f01, idle: 0x403c96b0
IP: 0x40376e6d, idle: 0x403c96c8
IP: 0x40378dd9, idle: 0x403c96d8
IP: 0x4037ad45, idle: 0x403c96f0
IP: 0x4037ccb1, idle: 0x403c9700
IP: 0x4037ec1d, idle: 0x403c9718
IP: 0x40380b89, idle: 0x403c9728
IP: 0x40382af5, idle: 0x403c9740
IP: 0x40384a61, idle: 0x403c9750
IP: 0x403869cd, idle: 0x403c9768
IP: 0x4031a1f7, idle: 0x403c9988
sulliwk06
Posts: 118
Joined: Tue Sep 17, 2013 7:14 am

Re: Crash implementing dual-stack

Post by sulliwk06 »

The only other thing I can think of is to verify what types of objects each of your IPADDR's is and check their sizeof(). I would also probably do a clean before the build to make sure everything is up to date.
User avatar
TomNB
Posts: 538
Joined: Tue May 10, 2016 8:22 am

Re: Crash implementing dual-stack

Post by TomNB »

I agree with sulliwk06.

I realize you are suggesting the compiler, but looking at this as an outsider and not an author of the code:
- This problem isn't happening with any other code or apps I am aware of. We use IPADDR in classes all the time.
- Accessing an IPADDR as a 32 bit number can certainly cause this type of problem. That is why looking everywhere else in the code that might access this class as an IPADDR4 is essential.
- Since moving things around in the declaration changes the behavior, there might have always been a memory issue (outside chance of this). Any possibility any of your tasks are using more than the default task stack space?

Memory corruption can be hard to find. From your previous data it looks like there was a null point exception, so somehow a pointer was pointing to 0. Any possibility any of your tasks are using more than the default 8k task space?

The only compiler related thing that comes to mind could be an alignment issue. Are you using any libraries from a third party or that you built yourself, other than netburner.a and, well, I was going to mention your platform library, but I don't see anything here that tells me what it is. For example for a MOD5441x, it would be MOD5441X.a. This is a longshot as well, but the only thing that came to mind. A bad compiler would be my last choice, but I suppose nothing is impossible.
SeeCwriter
Posts: 606
Joined: Mon May 12, 2008 10:55 am

Re: Crash implementing dual-stack

Post by SeeCwriter »

You might be correct, that this is not a compiler issue. Odds favor that it's my
fault. I just can't find it yet.

In my latest test I moved initializing 'idle' parameter from the constructor to the
Init() function of the class. Then I added a single call to initialize one of the
protocol_link objects as the first thing I do in UserMain().

Code: Select all

protocol_link mandc[TOTAL_PROTOCOL_LINKS];

void UserMain(void * pd) 
  {
  mandc[0].Init( -1, 'A', false, false );
  ...
  }
It didn't crashed in UserMain, but it did crash later in the initialization calling
the same Init() function. So one would think that somewhere in the initialization,
the idle timer is being corrupted.

So I added print statements to try and find where the corruption is occurring.

Code: Select all

void UserMain(void * pd) 
  {
  mandc[0].Init( -1, 'A', false, false );

  puts( "Before" );
  printf( "Init Serial: IP: 0x%lx, idle: 0x%lx\r\n", (DWORD) (&mandc[0].ip_addr), (DWORD) mandc[0].idle );

  puts( "Calling Init" );
  OSTimeDly( TICKS_PER_SECOND / 4 ); // wait for string to clock out.

  InitMain();
  ...
  }

void protocol_link::Init(int fd, char rs485_address, bool enable_address, bool as_udp)
  {
  static bool init_done = false;
  if ( !init_done )
    idle = new c_interval( 300000 ); // 300,000 msec = 5 minutes
  printf( "1 In Init: idle = %lx, done = %d\r\n", (DWORD) idle, (int)init_done );
  init_done     = true;
  port          = fd;
  address       = rs485_address;
  check_address = enable_address;
  tbuf[0]       = 0;  // v97: null terminate response buffer in case no response to web-initiated serial cmd.
  udp_redirect  = as_udp;   // v210: means to convert input stream into udp-style messages aka terminated by \r\n.
  printf( "2 In Init: idle = %lx, done = %d\r\n", (DWORD) idle, (int) init_done );
  Reset();
  printf( "3 In Init: idle = %lx, done = %d\r\n", (DWORD) idle, (int) init_done );
  }
Here is the output:

Code: Select all

Waiting 2sec to start 'A' to abort
1 In Init: idle = 403c9818, done = 0
2 In Init: idle = 403c9818, done = 1
3 In Init: idle = 403c9818, done = 1
Before
Init Serial: IP: 0x403614c5, idle: 0x0
Calling Init
Notice that the last print statement before returning from Init() shows that the idle
timer has been successfully initialized.

Also notice that the print statement immediately following Init() shows that idle has been corrupted and is now zero.
So the next time Init() is called later on the program crashes.

I have no explanation for this. Any ideas?

FYI, I did a clean build for all tests.
sulliwk06
Posts: 118
Joined: Tue Sep 17, 2013 7:14 am

Re: Crash implementing dual-stack

Post by sulliwk06 »

Well if the values are correct at the end of your init function on the inside, but when you leave the init function the values are corrupt, that suggests to me a stack corruption issue. I would slowly start commenting out things within your init and reset functions until the corruption stops.
SeeCwriter
Posts: 606
Joined: Mon May 12, 2008 10:55 am

Re: Crash implementing dual-stack

Post by SeeCwriter »

There is nothing on the stack in Init() except the return address. And idle is on the heap. But, just to be sure, I commented out everything but initializing the idle parameter. It still gets corrupted.
SeeCwriter
Posts: 606
Joined: Mon May 12, 2008 10:55 am

Re: Crash implementing dual-stack

Post by SeeCwriter »

Changing parameter "idle" from a pointer to a class to the class itself also resolves the issue, and it doesn't matter where in the class the parameter is declared. So that's what I'm going with. Having to depend on the order of declarations is a landmine waiting to be stepped on.
From: c_interval *idle;
To: c_interval idle;

I still suspect the issue is with either with the compiler or the NB IPv6 library. My co-worker, who uses the same class, has the same issue.
User avatar
TomNB
Posts: 538
Joined: Tue May 10, 2016 8:22 am

Re: Crash implementing dual-stack

Post by TomNB »

I didn't see an answer to the library question. Are you using any 3rd party libraries, or have you made any of your own?
SeeCwriter
Posts: 606
Joined: Mon May 12, 2008 10:55 am

Re: Crash implementing dual-stack

Post by SeeCwriter »

I am using an SNMP library from DMH Software. But the program crashes long before it gets used/loaded. And I have made no libraries of my own.
Post Reply