Detecting Stack and Buffer overflows

Post by **Forrest** » Tue Jul 31, 2012 4:40 pm

--REVISION 1--
--Rough draft of future appnote--

Introduction
I see this question come up a lot both on our support system and occasionaly in this forum. In this post, I'm going to cover some of the ways to deal with this, the pros and cons of these ways, and offer example code on how to use these functions. A, in the case of a stack overflow, we offer a way to enable stack checking in the code using the UCOS_STACKCHECK define. B, in the case of a stack overflow or buffer overflow, you can use an undocumented but very useful function(s) that will generate a trap when either of these cases are detected.

This is a bit of advanced tutorial (especially the second part). But if you are really stuck, you can probably BS your way through most of this

UCOS_STACKCHECK DEFINE
There is a very convenient way to check for stack overflows by using the UCOS_STACKCHECK define found in \nburn\include\predef.h.

To enable this functionality, open predef.h, find the following line and uncomment it.

Code: Select all

/* #define UCOS_STACKCHECK    (1) */

After you uncomment this line, do s system rebuild. (In NBECLIPSE.. NBeclipse-> Rebuild modified system files), then do a project->clean on your application so that it fully rebuilds.

Once that is enabled, you can use the following functions in your code.
OSDumpTCBStacks() - This function dumps information about the UCOS stacks and tasks to Stdout. This function is useful for debugging. Note: This function is only valid when UCOS_STACKCHECK is defined.

The output of this function will look like:

Code: Select all

  Prio  StackPtr  Stack Bottom     Free Now  Minimum Free
   63 | 0x209a38c | 0x2098400     |     8076 |     7992
   50 | 0x20002818 | 0x200008c0     |     8024 |     5160
   40 | 0x20008cc8 | 0x20006d80     |     8008 |     8008
   39 | 0x20006ca8 | 0x20004d60     |     8008 |     6456
   38 | 0x20004938 | 0x200029f0     |     8008 |     7848
   45 | 0x2094258 | 0x20924f0     |     7528 |     7528

In this code, you can tell if you have had a stack overflow by examining the minimum free column. If it is very low you may have a problem, and if it is 0, you have a problem. The prio number will indicate to you which task has the problem. So if you Task runs at MAIN_PRIO-1, you would see 49 here.

OSDumpTasks() This function dumps the state and call stack for every task to stdout. This function is useful for debugging. Note: This function is only valid when UCOS_STACKCHECK is defined.
The output of this function will look like:

Code: Select all

uc/OS Tasks
Prio  State    Ticks    Call Stack
Idle|Ready    |forever |at: 02002914
Main|Running
TCP |Semaphore|   890  |02002556->0200d0ca->02014560-><END>
IP  |Fifo     |    20  |02001a34->0200441a->02014560-><END>
Esnd|Fifo     |    34  |02001a34->02018390->02014560-><END>
HTTP|Semaphore|    20  |02002556->0200ff28->0200e41c->02014560-><END>

This is your task stack. With these numbers, you can find out the call stack of the various tasks. The first hex address in the list is the top of your stack. To convert this hex address to a file/line # and function, see the final section of this post.

Using SetAddressWrittenTrap and SetAddressWriteRangeTrap
These functions are available by include debugtrap.h in your source code. With these function you can cause your netburner to trap when an address or range is accessed by the processor. With this ability, you can do things like monitor the end of your stack, monitor the end of a buffer, and more.

Ideally, you want to enable smarttrap when using these functions. So make sure your application includes smarttrap.h and you enable smarttraps somewhere near the start of UserMain()

Code: Select all

    #ifndef _DEBUG
    EnableSmartTraps();
    #endif

For example, say you have a buffer (char buffer[100]

You suspect that one of your loops may be writing past the end of this buffer. What do you do? Create a SetAddressWrittenTrap and plug in the end of this buffer. This will cause your module to trap when that memory location is accessed. So you would write:

Code: Select all

char buffer[100];
SetAddressWrittenTrap((DWORD) &buffer+(100*sizeof(char))); // Watching for any write to buffer[100]

So now, when you accidently write past the end of your buffer, a smarttrap is going to occur. The smarttrap will print our diagnostic information out the debug UART port, and then reboot the module. On the serial port, you will see something like this:

Code: Select all

-------------------Trap information-----------------------------
Exception Frame/A7 =20002874
Trap Vector        =Debug interupt (12)
Format             =04
Status register SR =2000
Fault Status       =00
Faulted PC         =020001EC

-------------------Register information-------------------------
A0=4000020C A1=40000204 A2=02002888 A3=02019C22
A4=20002892 A5=02019C22 A6=2000289C A7=20002874
D0=0000000C D1=00000000 D2=0000000B D3=000000D3
D4=000000D4 D5=000000D5 D6=000000D6 D7=000000D7
SR=2000 PC=020001EC
-------------------RTOS information-----------------------------
The OSTCBCur current task control block = 2000066C
This looks like a valid TCB
The current running task is: Main#32
-------------------Task information-----------------------------
Task    | State    |Wait| Call Stack
Idle#3F|Ready     |    |0200293A,02014590,0
Main#32|Running   |    |020001EC,02014590,0
TCPD#28|Semaphore |037A|02002586,0200D0FA,02014590,0
IP#27|Fifo      |0014|02001A64,0200444A,02014590,0
Enet#26|Fifo      |0002|02001A64,020183C0,02014590,0
HTTP#2D|Semaphore |0014|02002586,0200FF58,0200E44C,02014590,0

-------------------End of Trap Diagnostics----------------------

I'm not going to explain the entire smarttrap output, thats covered in the docs. But lets look at whats important.

"Trap Vector =Debug interupt (12)". This is telling us we got a processor interrupt, which is what we are looking for. This is the way that you are informed by SetAddressWrittenTrap() that access occured in the monitored memory location.

"Faulted PC =020001EC" tells us the line number that was running when the interrupt occurred. It is very important to note that this is a fuzzy value. From the freescale manual:

...the breakpoint trigger becomes a debug interrupt to the processor, which is
treated higher than the nonmaskable level-7 interrupt request. As with all interrupts, it is made
pending until the processor reaches a sample point, which occurs once per instruction. Again, the
hardware forces the PC breakpoint to occur before the targeted instruction executes. This is
possible because the PC breakpoint is enabled when interrupt sampling occurs. For address and
data breakpoints, reporting is considered imprecise because several instructions may execute after
the triggering address or data is detected...

So, the location will not be precise, but it should give you enough to go on. Plug in the PC value 0x020001EC to addr2line, and you will see about where the problem occurred. From here, you should be able to decipher what the problem code is. (just looks for whatever is accessing that value). One note, what if the program counter is pointing to the wrong task? This is possible, as it is an OS with multiple tasks ready to run. In this case, you will need to examine the call stack in the Task information section. Work your way down the list until you find the line number that corresponds to the memory access.

Its also possible to use this same function to monitor your user tasks stack. Say you create a task:

Code: Select all

DWORD myTaskStack[USER_TASK_STK_SIZE];

void myTask (void *pd) {
      // task code
}

Now, to monitor that stack, you would just call the same function I used before

Code: Select all

SetAddressWrittenTrap((DWORD)&myTaskStack+(USER_TASK_STK_SIZE*(sizeof(char))));

Now you are monitoring the end of your usertask stack for any access. When it occurs, a smarttrap will occur, and you can use the techniques discussed to figure it out.

One special note on using SetAddressWrittenTrap and SetAddressWriteRangeTrap, is that you can only monitor one address or range. If you call the function again, the first address or range is overwritten. Lets say you are monitoring a single int foo. If you want to temporarily disable the trap for instance, when you are intentionally modifying int, you could call SetAddressWriteRangeTrap(0), modify the value, and then re-enable the trap with a successive call.

Finally, these functions are also useful if you have a variable that is getting overwritten and you want to know why. Just monitor that variable address!

Using m68k-elf-addr2line to figure out line numbers from hex address'
Okay, so you know "Faulted PC =020001EC". Great, what does that mean. Well, it's easy to convert that to a line number. Open up the command line and go to your project release directory. Call

Code: Select all

c:\nburn\workspace\example\release> m68k-elf-addr2line -f -e projectelffile.elf 0x020001EC
UserMain
C:\nburn\workspace\RangeTrapExample\Release/..\main.cpp:73

Of course, change projectelffile.elf to your project's own elf file. By default, it will be the projectname.elf.
the -f will give you a function name. This is the function that the current line is in.
The output above shows I was in UserMain when the trap occurred. Even better, it shows that I was on main.cpp line 73. That sure is easy!

But wait! sometimes you will see this output

Code: Select all

BFD: Dwarf Error: found dwarf version '0', this reader only handles version 2 information.
UserMain
??:0

This means either that the program counter is in a prebuilt library that is not supported by addr2line, or you didnt use the elf file of the application running on the module. In the case of the prebuilt library problem, this is usually gcc libraries. But, since you used -f, atleast you got the function name. In this case, you know that the problem is in UserMain. If that function is large, you may have some debugging in front of you.