A basic understanding of fuzzing methods -- https://www.soldierx.com/tutorials/Fuzzing-Basics
Linux installation with a BASH Shell
GCC
GNU Debugger
Debugging is a process of finding software errors in a computer program. Software errors can cause problematic issues during use. These issues can range from memory corruption or allocation errors to referencing invalid data sources. Either way you look at it, software bugs are not desirable and require attention to be remediated. From a security standpoint, software errors can effect flows of a program or process, possibly opening up the possibility of exploitation.
Depending on the type of program we are debugging, we can select the best tools to peel the program apart layer by layer. If we are running a Linux binary, the most common debugger available is called the "GNU Debugger" or GDB. GDB is a powerful debugger, but obscured beneath the classic command line interface (CLI). Through the gdb interface, you are able to disassemble the CPU instructions related to operating the program, set breakpoints, allowing you to examine variables, registers, flags, memory addresses and anything else going on at a specific point of the programs execution.
This ability allows us to make use of the fuzzing methods described in the previous article linked above. By tracking down the parts of a program our fuzzing techniques have broken, we can gain a better understanding on how to take over a process or function in a program. Effectively allowing you to execute code of your choosing.
There are many things that need to be described inside of a debugger. For now we are going to focus on just what we need to know. If you want more information on topics we briefly touch on, feel free to ask in the comments, and I will help you get some resources to better understand the related principal.
Registers:
Registers read and write data to the processor. Each processor architecture carries it's own set of registers unique to it's own capabilities. This is what makes the fundamental difference between system architectures. Intel’s x86 instruction set is most commonly used as most PC's are running x86 or x86_64 Intel based processors. These processors are different than RISC or Alpha based processors.
For this tutorial, we are going to focus on the x86 architecture.
The following are a list of general registers. We will not need to know ALL of these in detail for this session, but a solid understanding of them will help in being able to follow raw machine code cleanly.
--------------------------------------------------------------------------
Temporary variables stored for the CPU:
EAX - Accumulator Register
ECX - Counter Register
EDX - Data Register
EBX - Base Register
--------------------------------------------------------------------------
32-bit addresses that point to specific location in memory
ESP - Stack Pointer
EBP - Base Pointer
--------------------------------------------------------------------------
Pointers that direct to source or destination when data needs written.
ESI - Stack Index
EDI - Destination Index
--------------------------------------------------------------------------
EIP - Instruction Pointer - Points to the current instruction processor is reading
EFLAGS - Several bit flags used for bit-wise logic and operator comparisons
Pointers:
Pointers are a register that will point to some form of data by referencing a memory address. The memory address is 32-bits, or 4-bytes in size, and defined in C source code with a leading asterisk (*). The asterisk is commonly referred to as a dereferencing symbol, removing the reference of the memory address and reading back the data that is stored at that location instead. On the other side of the asterisk is the Address-Of Operator, which looks like an ampersand (&). When used in C, this will return the address of the pointer instead of the data stored inside of it.
Well a long time ago, in a land far, far away, there were these ancient machines that ran less than 64-bits, less than 32-bits, but back then, you had a whole 16 bit processor (and less) to work with! Registers and Pointers existed then as well, but back in those times, it was literally just an Accumulator Register, a Stack Pointer and so on... As computing advanced, the 16-bit architecture was Extended to 32-bit addressing.
Did you figure out the "E" now? Correct, they are all Extended XX's, providing 16 more bits.
As mentioned earlier, these all have their different levels of importance, but today we are going to focus on the few that apply to our example.
EIP is going to be the REALLY important pointer, as that tells the processor what the memory address is of current instruction that it should be processing. If you can redirect EIP to a different memory address of your choosing, one that contains the instructions YOU want to run, you have essentially have hit pay-dirt. Controlling the EIP register is going to be whole focus of our exercise...
The Stack:
There are numerous segments that make up the memory of a running program. There are Text, Data, BSS, Heap and Stack segments that work together to pass around the required data, environment variables, etc. Although this is rather expansive, I am going to focus on one segment, called the stack segment.
To be able to understand how such a critically important pointer can be modified, we need to examine the layout of the memory stack created to run your code. This memory stack lays out information relative to the function being ran by the CPU. This information includes uninitialized variables, flags, Saved Frame Pointer, Base Pointer, and other variables being stored for the running function.
Storing certain variables in the stack segment allow those variables to be unique within different functional contexts. For example if VarA is subtracted from VarB in FunctionA, the functional context would be different if VarA is added to VarB in FunctionB.
This segment of memory starts at the high memory addresses, and as more data gets put on, it grows towards the lower memory addresses.
Using the example provided during the last session, abo1.c would have a stack laid out similar to what is described below.
If you look in the stack example earlier, you can see how memory addresses are formatted along left hand side. Being able to understand how memory is laid out is not only extremely important during the exploitation process, but can prove extremely difficult to gain a solid grasp on.
Beyond understanding the stack, and how data is organized inside of it, we really need to know how to debug a running binary, and how to use the debugger to reveal the aspects of the binary we are trying to exploit.
As mentioned earlier, we are going to be working with the GNU Debugger, or otherwise referred to as 'gdb'. From the BASH prompt on a Linux terminal, you should be able to start up gdb with our test binary loaded.
We already know that 300 bytes pushed to abo1 will cause a segmentation fault, as explained earlier, so let us give that a try from gdb.
Remember how we had discussed 0x41 being the letter "A" in Hexadecimal ASCII? It appears the letter "A" has ended up going somewhere it wasn't intended to go!
Let us take a closer look at what is going on by setting a break-point before our argc[1] gets strcpy'd into buf.
Following along above, we can list the source code associated with abo1.c due to compiling the source code earlier with gcc using the flags -g, keeping the gnu debugging symbols associated with the binary. The command list, followed by the line number you wish to start listing from, will display the selected source code. Since it's a small program, and we want to see the entire code, we can just use 1 for the argument in list.
The vulnerable section of code is on line 9, so we can set a breakpoint in gdb using the break command, followed by the line number we wish to set the breakpoint on.
Entering run into GDB will restart the program from beginning, stopping at the first breakpoint specified. This is the state of the program before our fuzzed data gets pushed unto the stack and into the location of buf.
Using gdb, we can then examine aspects of memory being called by the binary. We know one variable named 'buf' exists, and it can be checked by the following:
buf is stored at 0xbffff590 holding the value 0xb7e9da28. This value can be retrieved using the examine string command as noted below.
You should notice that the string data stored at this location is all junk. Uninitialized data. This is because 'buf' is empty until the strcpy function copies argc[1] into it.
What other values did we discuss earlier? EBP and ESP, our base and stack pointers. We can view that data in much of the same way:
Following along with the addresses noted on the left hand side of the output, we can see the memory addresses for each of these values.
Now that we have our boundaries defined and understood before our fuzzed code gets copied into 'buf', we can continue the debugger and see what happens.
If you remember 0xbffff590 was the reported location of buf, we can use the examine command once again, to peek at that location in memory:
Using the methods shown a little earlier, we can fine tune our fuzzed string to validate the different portions of the code in use.
We know buf is allocated 264 bytes in the stack and 4 bytes passed that is EBP and 4 bytes passed that is our SFP. Splitting up our fuzzing string, we can use different letters to highlight our boundaries and area within the process we can modify/overwrite.
Looking at the string passed into abo1, if we were to examine the binaries Base Pointer, we can expect to see 0x42424242 stored in there.
As one final test to see if the debugger has given us the proper values, let us format one final string to pass, seeing if we can redirect EIP to an actual memory address. For our example here, we are going to use the fictional (but valid) memory address of 0xdeadbeef.
Going back to the super fun exploit math mentioned earlier, it is 268 bytes to overwrite the SFP, and then the memory address we want to replace it with, so EIP will jump there when the vulnerable function is called...
One thing when working in Intel's x86 based architecture, it relies on Little-Endian addressing. This means that addresses increase by the least significant byte value, allowing other bites to follow, increasing by order of significance. So simplify, this means that addresses get put into place backwards. When passing the string \xde\xad\xbe\xef into the buffer to populate the SFP's return address, it should be pushed on there backwards, as shown below:
Now that we know the exact point we can use to jump program execution elsewhere, the rest is just a matter of finding out where to jump execution to! The specifics of that portion will be addressed in the session after this. Our goal today was to overwrite EIP with a location of our choosing, in which was accomplished by sending it to (nonexistent) 0xdeadbeef.
As an exercise for the group, you have seen the method utilizing perl I had employed to accomplish this task. Can any of you remember the syntax to achieve the same result using Python as described in our last session?
Now that we have addressed the hard method of finding the distances between our values in the stack, we are going to approach this with an easier, new and improved method. Using Metasploit's pattern_create.rb and pattern_offset.rb included with the Metasploit Framework.
If you remember from the previous session, pattern_create.rb will create a string of a size you specify. In our fuzzing example, we used a 300 byte patter, so we will do the same again here:
I did wish to cover a few debugging applications for Windows as well as GDB, but in the interests of saving time, and saving a display on my total inability to use Windows itself, perhaps we will review OllieDBG, Immunity and various debuggers at a later time.