Advanced C Programming

Spring 2020 ECE 264 :: Purdue University

⚠ This is a PAST SEMESTER (Spring 2020).
Due 4/30

Huffman 8: command line

Overview

This is part 8 in an 8-part series in which you are implementing the Huffman Coding algorithm to compress files.

  1. HW11: Implement priority queue for future use
  2. HW12: Calculate character frequencies from a file
  3. HW13: Create a priority queue of trees from the character frequencies in a file.
  4. HW14: Build a single Huffman tree.
  5. HW15: Implement functions to write arbitrary bits to a file, for future use.
  6. HW16: Write the Huffman Tree to the disk.
  7. HW17: Write the encoded file contents to the disk.
  8. HW18: Create a command-line interface for compressing files. ◀◀◀ this homework

This is the final stretch!

In this homework you will combine all of the pieces you have built in this series to create a command-line file compression tool. You won't be creating a decompession tool, so this won't be very useful on its own, but you can take pride in knowing that you have implemented one of the most important data compression algorithms there is. Huffman coding is part of the compression algorithms used for ZIP, JPG, and MP3 files, among others. (Image, audio, and video compression algorithms use a lot more than this, but Huffman coding is an important building block.)

Syntax

The syntax of your command will be as follows:

huffman_compress input_path output_path

Input

The input_path parameter is the path to an existing uncompressed file. It may contain anything.

Output

The output_path parameter is the path where you want to write your new compressed file. It must not already exist. If it refers to another directory (other than the current working directory), you must have permission to create files in that directory.

The file format is as follows:

4 bytes compressed_file_size_bytes Size of this entire file, in bytes. In other words, this will be 12 (for the metadata) + # of bytes written by write_coding_table(…) + # of bytes written by write_compressed(…).
4 bytes coding_table_size_bytes Size of the encoded Huffman tree (topology), as encoded by write_coding_table(…), in bytes.
4 bytes uncompressed_file_size_bytes This will be the size of the file at input_path, in bytes.
Coding table (varies) Output of running write_coding_table(…).
Compressed data (varies) Output of running write_compressed(…).

Error handling

You must handle the following error types.

  1. Wrong number of arguments.
  2. Input file was not found.
  3. Input path is "".
  4. Output file already exists.
  5. Output path is "".
  6. Output file could not be written.

Error messages are included in the file huffman_compress.error_messages.c to ensure consistency. Use them only as specified in the requirements. Do not print any error messages at any other time or in any other manner other than as specified.

Changes to BitWriter and write_compressed(…).

You will need to make a few changes to your code from prior assignments.

At the end of your flush_bit_writer(…), add fflush(writer->file).

Modify your write_compressed(…) so that it takes a path where the uncompressed bytes will be read from, instead of taking the string (uncompressed_bytes) directly. The new function signature should be as follows:

void write_compressed(TreeNode* root, BitWriter* a_writer, const char* in_path);

A third change has been relegated to an optional bonus below.

Starter code

To save you time, we are giving you a lot of starter code this time.

Getting Started

  1. You should have already read the article and Huffman walkthrough . If you have not done so already, please do so before you proceed.
  2. Make sure you are caught up with the recent lectures.
  3. Get the starter code.
    you@ecegrid-thin1 ~ $ 264get hw18
    you@ecegrid-thin1 ~ $ cd hw18
  4. Copy all of your files from HW17, which should include everything from this series, so far.
  5. Update your Makefile.
    you@ecegrid-thin1 ~/hw18 $ vim Makefile
    Change the definitions at the top.
    ASG_NICKNAME = HW18
    TEST_TXT     = expected.txt
    TEST_ACTUAL  = actual.txt
    SRC_C        = huffman.c huffman_compress.c frequencies.c priority_queue.c bit_writer.c \
                   huffman_compress.error_messages.c simple_file_utilities.c
    MAIN_C       = huffman_compress_main.c
    MAIN_EXE     = huffman_compress
    SRC_H        = huffman.h huffman_compress.h priority_queue.h bit_writer.h frequencies.h \
                   simple_file_utilities.h miniunit.h clog.h
    TEST_C       = test_huffman_compress.c
    TEST_EXE     = test_huffman_compress
    SUBMIT_FILES = $(SRC_C) $(SRC_H) $(MAIN_C) $(TEST_C) Makefile
    # CAUTION:  Be careful that TEST_EXE does not end with .c or .h.
    

    Simplify your make test rule.

    test:  debug
        ./$(TEST_EXE)
    

    Create a make huffman_compress rule at the bottom.

    $(MAIN_EXE): $(SRC_C) $(MAIN_C)
        ./$(MAIN_EXE)
    
  6. Edit your huffman.h.
    you@ecegrid-thin1 ~/hw18 $ vim huffman.h

    Change the declaration for write_compressed(…).

    void write_compressed(TreeNode* root, BitWriter* a_writer, const char* in_path);
    
  7. Edit your huffman_compress.c.
    you@ecegrid-thin1 ~/hw18 $ vim huffman_compress.c

    Fill in check_command_line_arguments(…) according to the specification below. Add a little at a time. Test as you go. You may need to temporarily comment out compress_file(…) in order to test the rest. We have provided starter test code. Aim to get one test working at a time.

    Suggestion: First make it work for correct inputs. Test. Then, make it return the correct error message when the wrong number of arguments is supplied. Then, work on the other error conditions. Be careful about order.

    Fill in compress_file(…) according to the specification below. For this one, you will likely need to implement most of it all at once. The included tests will pass only when it is completely working. If you wish, you can add some tests that check only limited aspects of it (e.g., only the metadata, or only the coding table, for example).

    Test. Make sure you understand the included tests, if you choose to use them.

Requirements

  1. Your submission must contain each of the following files, as specified:
    file contents
    huffman_compress.c functions
    check command line arguments(int argc, const char✶ argv[])
    return type: const char✶
    Check that the command line arguments are sufficient to proceed with compressing the given file.
    • Check the following requirements.
      1. Wrong number of arguments.
      2. Input file was not found.
      3. Input path is "".
      4. Output file already exists.
      5. Output path is "".
      6. Output file could not be written.
    • For the first requirement you find that is not met, return the respective error code given in huffman_compress.error_messages.c.
      • You must use those string constants directly.
      • Do not change or modify the error messages.
      • Do not print anything from this function.
    • If all requirements are met, then return NULL.
    • argc and argv are exactly as main(…) received (or would receive) them.
    • You do not need to use main(…) to test this. See the included test_huffman_compress.c for examples.
    • Once you have verified that the number of arguments is correct, assign argv[1] and argv[2] to local variables with more descriptive names (e.g., in_path and out_path).
    • You may refer to argv only twice in this function, and may not pass them to any other function.
    compress file(const char✶ in path, const char✶ out path)
    return type: Metadata
    Read the file at in_path and create a compressed file at out_path, according to the format in the article .
    • The format is as specified above.
    huffman_compress_main.c functions
    main(int argc, char✶ argv[])
    return type: int
    This is your command line utility.
    • This is not a test file.
    • It will be compiled with gcc -o huffman_compress ….
    • This function should be very simple so that minimal testing is needed. The functions it calls will be tested thoroughly.
    • You may refer to argv three times in this function: ① to pass it to check_command_line_arguments(…), ② to assign argv[1] to a local variable (e.g., in_path), and ③ to assign argv[2] to a local variable (e.g., out_path).
    • This function should be very short and simple.
      • Expect ≈13 sloc for main(…) and ≈18 sloc for the entire file (huffman_compress_main.c).
    test_huffman_compress.c functions
    main(int argc, char✶ argv[])
    return type: int
    This is your test code—for huffman_compress.c only.
    • 100% code coverage is required for huffman.c.
      • Direct testing of frequencies.c, priority_queue.c, bit_writer.c, and huffman_compress_main.c is optional.
    • Use your miniunit.h.
    • You do not need to test test_huffman_compress.c from this file.
    • You are responsible for the correctness of your tests. If you choose to use the provided test code, you must read it, check it, and fix any bugs you find.
    huffman.h
    same as before, except for the change to write_compressed(…)
    huffman.c
    same as before, except for the change to write_compressed(…)
    priority_queue.h
    same as before
    priority_queue.c
    same as before
    bit_writer.h
    same as before
    bit_writer.c
    same as before
    frequencies.h
    same as before
    frequencies.c
    same as before
    miniunit.h
    as usual
    clog.h
    as usual
    Makefile
    as usual
    simple_file_utilities.c
    as provided (no changes)
    simple_file_utilities.h
    as provided (no changes)
  2. Submissions must meet the code quality standards and other course policies on homework.

Submit

To submit HW18 from within your hw18 directory, type

make submit

That should result in the following command: 264submit HW18 huffman.c huffman.h frequencies.c frequencies.h priority_queue.c priority_queue.h test_huffman_compress.c bit_writer.c bit_writer.h huffman_compress.c huffman_compress.h huffman_compress.error_messages.c huffman_compress_main.c simple_file_utilities.c simple_file_utilities.h miniunit.h clog.h Makefile

Pretester

There will most likely not be a pretester for this homework.

Q&A

  1. What is the difference between a path and a filename?

    A path could be a filename, but may also specify a directory where that filename is (or will be) located.
  2. How do I write the three bytes in the header?

    Store the data in an instance of the Metadata struct type. Then store that object to the file.
  3. How do I store a struct object to a file?

    (This was covered in lecture in both sections.)

    Use fwrite(…).

  4. Who do I write the file in compress_file(…)?

    Here is the process:
    • Make a Huffman tree from the contents of the file at in_path.
    • Open a file using open_bit_writer(…).
    • Create an instance of the Metadata struct type.
    • Write the Metadata struct object to the file.
      • This was covered in lecture.
      • You won't know the values for the Metadata object until after you've written the rest, so fill a placeholder for now. It can be all zeros if you want.
    • Write the coding table (tree topology) using your write_coding_table(…).
    • Write a 0 bit (as specified in the article ) to signify the end of the tree. Do not flush before writing the 0 bit. It should come immediately after the bits of the rest of the tree—quite possibly part of the last byte.
    • Write the compressed data using your write_compressed(…).
    • Update the Metadata object with the correct values. You may find ftell(…) useful here.
    • Go back to the beginning of the file using fseek(…) and overwrite the placeholder Metadata object with the real one using fwrite(…).
  5. What do argc and argv mean?

    (This was covered in Prof. Quinn's lecture.)

    • argc is the number of command line arguments, including the name of the program.
    • argv is an array of addresses to each command line argument. argv[0] is always the name of the executable, as it was called.
    • Example: If you type ./huffman_compress letter.txt letter.txt.hc in bash, you will have argv[0]=="./huffman_compress", argv[1]=="letter.txt", and argv[2]=="letter.txt.hc".
  6. What should the main(…) in huffman_compress_main.c do?

    Here is the process:
    • Check if arguments are correct using check_command_line_arguments(…).
    • If so, write the compressed file using compress_file(…).
    • If there were any errors, print them to stderr using fprintf(…) and return EXIT_FAILURE.
    • Otherwise, print nothing and return EXIT_SUCCESS.

Updates

There are no updates to HW18 so far.