Buffer overflow
Goals
- Understand how your compiled C code operates at the instruction level.
- Appreciate the value of memory safety to security concerns.
- Get a brief introduction to application security at the binary level.
Overview
There are many ways to attack a vulnerable application, to behave in a way that the author did not intend—and that the attacker did intend. One very common way is the buffer overflow attack. This is possible if a target program does takes inputs from a user and loads them into a buffer (string) without checking if it is big enough to hold the input.
In this assignment, you are given a vulnerable program, including the source code. Your job is to create a malicious input string that will cause it to call another function.
Your buffer overflow attack will consist of sending an attack string to the
program that is longer than the length of the name string buffer,
and spills junk data all the way to the return address. (See the readings.)
You will overwrite the return address with a different address in the code (in the
text segment), so that when greet_visitor(…)
tries to
return, instead of jumping back to main(…)
, it instead
jumps to scare_visitor(…)
. Thus, you need to find the
number of bytes from beginning of name to the beginning of the
return address. That is the number of filler characters you will have. Then, you will
add a few more bytes: the address of the scare_visitor(…)
function in memory.
You will not write any C code for HW22. Instead, you will create a very small text file containing an attack string. When the attack string is piped to the target program, it will cause it to do something it wasn't intended to do.
Most of your effort will be to analyze the stack of the target program (and get familiar enough to do so).
Doing the assignment
This assignment is different from the others in ECE 264. Most of what you need to do is just understanding ⓐ the stack, and ⓑ how function calls work. You will do that through two required readings, and a little tinkering in GDB. Once you have done that, creating the attack string.
Learn
- Learn how to examine the disassembly of your program in GDB.
Read the first half of
Understanding C by learning assembly.
You can stop reading when you see
the heading “understanding static local variables”.
This blog post is one of the clearest introductions i've found; it doesn't go into too much detail. Also, it ties in nicely with C programming and GDB.≈3 pages (not counting the parts you skip)
- Learn how function calls work, and how to read assembly instructions.
Read Introduction to x64 assembly language, except the left half
of page 2 (about directives). Page 3 (instructions) will help you make sense of the
assembly instructions you are seeing, but you won't need to understand deeply.
– 3½ pages (not counting the left half of page 2) - Understand how
greet_visitor(…)
is called in our target program (boa.c). This (PDF walk-through) takes you through each step of the function call. The memory addresses are different from what you will get when you compile and run boa.c, but once you understand this, you will be ready to find the information in the real executable. - Check your understanding. When you are done with the readings, you should
be able to answer the following:
- What is the disassembly of a program?
- What is a register?
- What do the base pointer (
%rbp
) and stack pointer (%rsp
) tell us? - What does the instruction pointer (
%rip
) tell us? - Is
%rbp < %rsp
or is%rsp < %rbp
? - Are local variables at a higher or lower memory address from
%rbp
? - What happens in the preamble? … and the epilogue?
- What does the
retq
instruction do? - What does the notation
-0x8(%rbp)
mean? - Where is the return address stored, relative to the base pointer (
%rbp
)?
- You are now ≈80% done with HW22, the last homework for ECE 264. (Until you have done the above, the steps below are likely to be confusing.)
Get acquainted with the target program.
- Get the starter files.
$
264get hw22
- Compile boa.c to create the exectuable boa. Note that you have to suppress warnings for unused variables and for using
gets
as the coursegcc
alias treats warnings as errors.$
gcc -o boa boa.c -Wno-unused-variable -Wno-implicit-function-declaration
- Run the target program normally (boa.c). This is to make sure
you understand the normal behavior of hit program, so you don't miss the
big picture of this assignment.
$
./boa
Hello. What is your name? Tom Hello, Tom.
Learn how to pipe input from a file to a program.
- Create an input file containing your name and pipe it to boa.
Just open your editor, type your name, save, and exit.
$
vim input.txt
This command runs boa and uses the contents of input.txt as the input, i.e., in lieu of whatever the user might have typed at the terminal.$
./boa < input.txt
Hello. What is your name? Hello, Tom. - Create an input file from the command line boa. The
printf
command in bash lets you print formatted messages to the terminal using syntax that is vaguely similar toprintf(…)
in C. Examples:$
printf "Tom\n"
TomYou can also specify characters using their hexadecimal codes. That will be helpful later in this process when you need to print a memory address, which may include bytes that do not correspond to easily typable ASCII characters. For example, the address 0x400123 can be printed withprintf "\x23\x01\x40\x00\x00\x00\x00\x00"
, using little-endian byte order. For example,"\x41"
is the same as"A"
. Both will emit 1 byte when printed. Likewise,"\x41B\x43"
is the same as"A\x42C"
is the same as"ABC"
. All emit 3 bytes when printed.Let's use this same method to print the name “Tom” again.$
printf "\x54\x6f\x6d\n"
Tom
Examine the executable (boa)
- Start boa in gdb. Set a breakpoint inside
greet_visitor(…)
, just before it returns. In other words, set the breakpoint for the line containing the closing curly brace (}
). - Try some GDB commands. At this point, the program boa should be running,
and GDB should be paused at the point just before
greet_visitor(…)
returns. Try a few commands. (We are not showing the output here. You need to try it yourself.)(gdb) info registers… shows the values in all register. The only ones we care about are$rbp
(base pointer),$rsp
(stack pointer), and$rip
(instruction pointer).(gdb) disassemble /s greet_visitor… shows the assembly instructions forgreet_visitor(…)
. The/s
flag tells GDB to intersperse the C code lines with the disassembly.(gdb) disassemble /s scare_visitor… shows the assembly instructions forscare_visitor(…)
.(gdb) disassemble /s main… shows the assembly instructions formain(…)
.(gdb) print $rbp… prints the value of the base pointer. - Print a hex dump of the current stack frame using the
x/
command. We will use a command likex/▒▒bx ▒▒
.From the readings, you will know that the stack frame begins at$rsp
so the command will bex/▒▒bx $rsp
.But how many bytes do we need to look at? For that, we need to know$rbp - $rsp + 8
.GDB makes this surprisingly easy.(gdb) print $rbp - $rsp + 8That should print a number. That's how many bytes we want to look at.- Find the location of the variable name relative to the base pointer. In other words, you need to find the base pointer offset (
▒▒▒(%rbp)
) of the name variable.This may require some fiddling. We are not providing the exact commands. We want you to get a little experience digging around, and thinking about how the stack is arranged.- Find the location of the return address, relative to the base pointer. If you followed the readings carefully, this part will be trivial. Otherwise, just look at the disassembly. You can get it from the assembly instructions and/or from inspecting the hex dump of the stack.
- Calculate the distance from name to the return address. Remember: You will be deliberately overflowing the name buffer (string) so that you can overwrite the old return address with your own evil address.
- Find the value you will need to write into the return address. The goal is for it to call
scare_visitor(…)
and printBRAH!!!
. Thus, you need to find the address of the instruction to jump to (instead of going back tomain(…)
).- Write out your attack string on paper. It will consist of a real name (e.g.,
Tom
), followed by some filler bytes, and finally the new (evil) instruction address that you want to overwrite the return address with.Create your attack
- Create your attack input file: input.txt.
$
printf "name\x00filler bytes\x▒▒\x▒▒\x▒▒\x▒▒\x▒▒\x▒▒\x▒▒\x▒▒" > input.txt
Remember that in memory, addresses are stored in little endian byte order, just like the numbers in the BMP file format in images and mtat. For example, the address 0x400123 would be shown as 0x23 0x01 0x40 0x00 0x00 0x00 0x00 0x00 in the memory dump (via the gdbx
command). When you craft your attack string, it will need to be little endian, as well. - Test your attack.
$
./boa < input.txt
Hello. What is your name? Hello, Tom.█████████████▓▓▒▒░░BRAH!!! Segmentation fault (core dumped)It's okay if you get a segmentation fault. In fact, you will definitely get a segmentation fault. That is one of many rules we are breaking for HW22. As long as it prints “BRAH!!!”, you are done and ready to submit.
How much work is this?
Expect to spend around ≈2 hours reading, ≈1 hours fussing with GDB, and ≈1 hour crafting your attack string. Obviously, your experience may vary.
Prester
There will be no pretester for HW22. We will test your attack using the command shown in step 20 (above).Requirements
- Your submission must contain each of the following files, as specified:
file contents input.txt attack string Supplying this string to the executable, boa, should cause it to redirect execution to scare_visitor(…)
, which will print BRAH!!!.boa.c target code Target code, submitted as is with no changes - We will most likely not look at this. It is only a failsafe.
boa target executable Compiled target code - This is just the executable that you compiled using gcc.
- You don't need to do anything except compile what you were given.
- We will most likely not use this. It is only a failsafe, in case we can't reproduce your attack with ours.
Submit
To submit HW22 from within your hw22 directory, type
264submit HW22 input.txt boa.c boa
We do not plan to use the boa.c and boa you submit. We ask you to submit them only as a failsafe, in case somehow your executable ends up different from ours.
Q&A
-
Will the addresses of functions always stay the same?
They are tied to a specific executable file. As long as you don't recompile, they will be the same. -
How can you be so sure the addresses won't change?
The executable file specifies where in memory it should be loaded. -
Is there any other way to see the mapping between code and instructions, besides GDB?
objdump -S --disassemble boa
This prints everything in the executable, including other supporting code. Our code accounts for only 188 of the 8689 bytes in the boa executable.It is also possible to get the disassembly from gcc usinggcc boa.c -o boa.s
but this is less useful than objdump or gdb because it does not interleave the C code lines with the assembly. -
What if I recompile?
If you recompile on the same machine with the same compiler, same options, and same boa.c, addresses will not change. -
How is the
printf
bash command related to theprintf(…)
C function?
Name only… and a few format conventions (e.g.,%d
,%s
, etc.) that were mimicked by the creators of theprintf
bash command. For C programmers, it can be handy to be able to format strings using percent codes (e.g.,%d
) and hexadecimal escape characters (e.g.,\x07
) from bash. -
I am getting the segmentation fault but no "BRAH!!!" message. What's wrong?
You are on the right track. Most likely, the new return address isn't getting to the right location on the stack.Make sure you have the right address forscare_visitor(…)
, the right offset (distance from name[0] to return address), and corresponding number of filler characters. Also, keep in mind thatgets(…)
will write a null terminator ('\0'
) after your input string. Since the return address, when written in little endian, ends with a bunch of zeros, you may want to leave off the last zero in the return address. That way, the null terminator (==0) will simply be overwriting another zero. -
What's the best way to see the contents of my input.txt?
xxd input.txt
-
Can I edit input.txt in Vim?
Yes, but you must open Vim with in binary mode using the -b flag like this:vim -b input.txt
Without that, Vim—like most editors in the world—automatically add a newline (\n
) at the end of the file, if there isn't already one there. The -b flag tells Vim not to do that.To type non-printable ASCII characters (≤31 or ≥127), pressCtrl-V
thenx
then the two digit hex value (e.g.,06
).Another way to edit a binary file in Vim is to convert it to the hex dump—and then back. First open the file in binary mode (vim -n input.txt
). Next, enter:%!xxd
to convert the current buffer to a hex dump. Make any edits you like, but be sure to keep the hex dump format. Finally, enter:%!xxd -r
to convert back from the hex dump to the binary file. -
Do I need to add any flags special flags to gcc when compiling boa.c?
No. Initially, Prof. Quinn said in lecture that you should add-O0
since it is widely written online that turning off compiler optimizations using that flag is needed to keep the executable stable. Upon some experimentation, that does not seem to be the case. Likewise, it is often reported that for simple buffer overflow attacks like this to work, you must explicitly tell gcc to turn off certain protections. That does not seem to be necessary for our simple case. It works just fine with the default compiler flags that we use in this class (-g -std=c11 -Wall -Wshadow -Wvla -Werror -pedantic
).
Updates
12/8/2022 Link for Introduction to x64 Assembly - Martin Hirzel was fixed. - Find the location of the variable name relative to the base pointer. In other words, you need to find the base pointer offset (