Huffman 8: command line
Overview
This is part 8 in an 8-part series in which you are implementing the Huffman Coding algorithm to compress files.
- HW11: Implement priority queue for future use
- HW12: Calculate character frequencies from a file
- HW13: Create a priority queue of trees from the character frequencies in a file.
- HW14: Build a single Huffman tree.
- HW15: Implement functions to write arbitrary bits to a file, for future use.
- HW16: Write the Huffman Tree to the disk.
- HW17: Write the encoded file contents to the disk.
- HW18: Create a command-line interface for compressing files. ◀◀◀ this homework
This is the final stretch!
In this homework you will combine all of the pieces you have built in this series to create a command-line file compression tool. You won't be creating a decompession tool, so this won't be very useful on its own, but you can take pride in knowing that you have implemented one of the most important data compression algorithms there is. Huffman coding is part of the compression algorithms used for ZIP, JPG, and MP3 files, among others. (Image, audio, and video compression algorithms use a lot more than this, but Huffman coding is an important building block.)
Syntax
The syntax of your command will be as follows:
huffman_compress input_path output_path
Input
The input_path parameter is the path to an existing uncompressed file. It may contain anything.
Output
The output_path parameter is the path where you want to write your new compressed file. It must not already exist. If it refers to another directory (other than the current working directory), you must have permission to create files in that directory.
The file format is as follows:
Metadata | 4 bytes | compressed_file_size_bytes |
Size of this entire file, in bytes. In other words, this will be
12 (for the metadata)
+ # of bytes written by write_coding_table(…)
+ # of bytes written by write_compressed(…) .
|
4 bytes | coding_table_size_bytes |
Size of the encoded Huffman tree (topology), as encoded by
write_coding_table(…) ,
in bytes.
|
|
4 bytes | uncompressed_file_size_bytes | This will be the size of the file at input_path, in bytes. | |
Coding table | (varies) | Output of running write_coding_table(…) . |
|
Compressed data | (varies) | Output of running write_compressed(…) . |
Error handling
You must handle the following error types.
- Wrong number of arguments.
- Input file was not found.
- Input path is
""
. - Output file already exists.
- Output path is
""
. - Output file could not be written.
Error messages are included in the file huffman_compress.error_messages.c to ensure consistency. Use them only as specified in the requirements. Do not print any error messages at any other time or in any other manner other than as specified.
Changes to
BitWriter
and write_compressed(…)
.
You will need to make a few changes to your code from prior assignments.
At the end of your flush_bit_writer(…)
,
add fflush(writer->file)
.
Modify your
write_compressed(…)
so that it takes
a path where the uncompressed bytes will be read from, instead of
taking the string (uncompressed_bytes
) directly.
The new function signature should be as follows:
void write_compressed(TreeNode* root, BitWriter* a_writer, const char* in_path);
A third change has been relegated to an optional bonus below.
Starter code
To save you time, we are giving you a lot of starter code this time.
- huffman_compress.c is a skeleton of the main code you need to write for HW18.
- huffman_compress.error_messages.c initializes the error messages (global string constants).
-
huffman_compress.h contains the declarations of the
Metadata
struct type, the functions you need to implement, and the error messages (global string constants). -
huffman_compress_main.c contains a skeleton
main(…)
function for your command line program. - simple_file_utilities.c contains some utilities that will be helpful to you in checking the command line arguments, as well as in your tests.
- simple_file_utilities.h declares and documents the utilities in simple_file_utilities.c.
-
test_huffman_compress.c is an optional test file that you can use to jump start your testing. You may need to make some additions—or even corrections—but we are giving you the bulk of your test code for free this time. 👌 You may copy/adapt any code in this file (test_huffman_compress.c) for purposes of this assignment.
⚠ This test code is provided as is. You are welcome to make your own test code instead, or use parts of this and parts of your own. You may copy/adapt/remix it (for purposes of this assignment in Spring 2020 only) however you like, provided that you understand it. If you choose to use the included tests, you are still responsibile for the correctness of whatever code you submit, including test code. Read the code. Make sure you understand it. Verify that it is correct. If you find a bug, it is up to you to fix it.
Getting Started
- You should have already read the article and Huffman walkthrough . If you have not done so already, please do so before you proceed.
- Make sure you are caught up with the recent lectures.
-
Get the starter code.
you@ecegrid-thin1 ~ $
264get hw18
you@ecegrid-thin1 ~ $
cd hw18
- Copy all of your files from HW17, which should include everything from this series, so far.
-
Update your Makefile.
Change the definitions at the top.
you@ecegrid-thin1 ~/hw18 $
vim Makefile
ASG_NICKNAME = HW18 TEST_TXT = expected.txt TEST_ACTUAL = actual.txt SRC_C = huffman.c huffman_compress.c frequencies.c priority_queue.c bit_writer.c \ huffman_compress.error_messages.c simple_file_utilities.c MAIN_C = huffman_compress_main.c MAIN_EXE = huffman_compress SRC_H = huffman.h huffman_compress.h priority_queue.h bit_writer.h frequencies.h \ simple_file_utilities.h miniunit.h clog.h TEST_C = test_huffman_compress.c TEST_EXE = test_huffman_compress SUBMIT_FILES = $(SRC_C) $(SRC_H) $(MAIN_C) $(TEST_C) Makefile # CAUTION: Be careful that TEST_EXE does not end with .c or .h.
Simplify your
make test
rule.test: debug ./$(TEST_EXE)
Create a
make huffman_compress
rule at the bottom.$(MAIN_EXE): $(SRC_C) $(MAIN_C) ./$(MAIN_EXE)
-
Edit your huffman.h.
you@ecegrid-thin1 ~/hw18 $
vim huffman.h
Change the declaration for
write_compressed(…)
.void write_compressed(TreeNode* root, BitWriter* a_writer, const char* in_path);
-
Edit your huffman_compress.c.
you@ecegrid-thin1 ~/hw18 $
vim huffman_compress.c
Fill in
check_command_line_arguments(…)
according to the specification below. Add a little at a time. Test as you go. You may need to temporarily comment outcompress_file(…)
in order to test the rest. We have provided starter test code. Aim to get one test working at a time.Suggestion: First make it work for correct inputs. Test. Then, make it return the correct error message when the wrong number of arguments is supplied. Then, work on the other error conditions. Be careful about order.
Fill in
compress_file(…)
according to the specification below. For this one, you will likely need to implement most of it all at once. The included tests will pass only when it is completely working. If you wish, you can add some tests that check only limited aspects of it (e.g., only the metadata, or only the coding table, for example).Test. Make sure you understand the included tests, if you choose to use them.
Requirements
- Your submission must contain each of the following files, as specified:
file contents huffman_compress.c functions check command line arguments(int argc, const char✶ argv[])
→ return type: const char✶Check that the command line arguments are sufficient to proceed with compressing the given file.- Check the following requirements.
- Wrong number of arguments.
- Input file was not found.
- Input path is
""
. - Output file already exists.
- Output path is
""
. - Output file could not be written.
-
For the first requirement you find that is not met, return the respective
error code given in huffman_compress.error_messages.c.
- ⚠ You must use those string constants directly.
- ⚠ Do not change or modify the error messages.
- ⚠ Do not print anything from this function.
- If all requirements are met, then return
NULL
. -
argc
andargv
are exactly asmain(…)
received (or would receive) them. -
You do not need to use
main(…)
to test this. See the included test_huffman_compress.c for examples. -
Once you have verified that the number of arguments is correct,
assign
argv[1]
andargv[2]
to local variables with more descriptive names (e.g.,in_path
andout_path
). -
⚠ You may refer to
argv
only twice in this function, and may not pass them to any other function.
compress file(const char✶ in path, const char✶ out path)
→ return type: MetadataRead the file atin_path
and create a compressed file atout_path
, according to the format in the article .- The format is as specified above.
huffman_compress_main.c functions main(int argc, char✶ argv[])
→ return type: intThis is your command line utility.- This is not a test file.
- It will be compiled with
gcc -o huffman_compress …
. - This function should be very simple so that minimal testing is needed. The functions it calls will be tested thoroughly.
- ⚠ You may refer to
argv
three times in this function: ① to pass it tocheck_command_line_arguments(…)
, ② to assignargv[1]
to a local variable (e.g.,in_path
), and ③ to assignargv[2]
to a local variable (e.g.,out_path
). - This function should be very short and simple.
-
Expect ≈13 sloc for
main(…)
and ≈18 sloc for the entire file (huffman_compress_main.c).
-
Expect ≈13 sloc for
test_huffman_compress.c functions main(int argc, char✶ argv[])
→ return type: intThis is your test code—for huffman_compress.c only.- 100% code coverage is required for huffman.c.
- Direct testing of frequencies.c, priority_queue.c, bit_writer.c, and huffman_compress_main.c is optional.
- Use your miniunit.h.
- You do not need to test test_huffman_compress.c from this file.
- ⚠ You are responsible for the correctness of your tests. If you choose to use the provided test code, you must read it, check it, and fix any bugs you find.
huffman.h ⋯ same as before, except for the change to write_compressed(…)huffman.c ⋯ same as before, except for the change to write_compressed(…)priority_queue.h ⋯ same as beforepriority_queue.c ⋯ same as beforebit_writer.h ⋯ same as beforebit_writer.c ⋯ same as beforefrequencies.h ⋯ same as beforefrequencies.c ⋯ same as beforeminiunit.h ⋯ as usualclog.h ⋯ as usualMakefile ⋯ as usualsimple_file_utilities.c ⋯ as provided (no changes)simple_file_utilities.h ⋯ as provided (no changes) - Check the following requirements.
- Submissions must meet the code quality standards and other course policies on homework.
Submit
To submit HW18 from within your hw18 directory, type
make submit
That should result in the following command:
264submit HW18 huffman.c huffman.h frequencies.c frequencies.h priority_queue.c priority_queue.h test_huffman_compress.c bit_writer.c bit_writer.h huffman_compress.c huffman_compress.h huffman_compress.error_messages.c huffman_compress_main.c simple_file_utilities.c simple_file_utilities.h miniunit.h clog.h Makefile
Pretester
There will most likely not be a pretester for this homework.
Q&A
What is the difference between a path and a filename?
A path could be a filename, but may also specify a directory where that filename is (or will be) located.How do I write the three bytes in the header?
Store the data in an instance of theMetadata
struct type. Then store that object to the file.How do I store a struct object to a file?
(This was covered in lecture in both sections.)Use
fwrite(…)
.Who do I write the file in
Here is the process:compress_file(…)
?-
Make a Huffman tree from the contents of the file at
in_path
. -
Open a file using
open_bit_writer(…)
. -
Create an instance of the
Metadata
struct type. -
Write the
Metadata
struct object to the file.- This was covered in lecture.
- You won't know the values for the
Metadata
object until after you've written the rest, so fill a placeholder for now. It can be all zeros if you want.
- Write the coding table (tree topology) using your
write_coding_table(…)
. - Write a 0 bit (as specified in the article ) to signify the end of the tree. Do not flush before writing the 0 bit. It should come immediately after the bits of the rest of the tree—quite possibly part of the last byte.
- Write the compressed data using your
write_compressed(…)
. - Update the
Metadata
object with the correct values. You may findftell(…)
useful here. - Go back to the beginning of the file using
fseek(…)
and overwrite the placeholderMetadata
object with the real one usingfwrite(…)
.
-
Make a Huffman tree from the contents of the file at
What do
argc
andargv
mean?(This was covered in Prof. Quinn's lecture.)
-
argc
is the number of command line arguments, including the name of the program. -
argv
is an array of addresses to each command line argument.argv[0]
is always the name of the executable, as it was called. -
Example: If you type
./huffman_compress letter.txt letter.txt.hc
in bash, you will haveargv[0]=="./huffman_compress"
,argv[1]=="letter.txt"
, andargv[2]=="letter.txt.hc"
.
-
What should the
Here is the process:main(…)
in huffman_compress_main.c do?-
Check if arguments are correct using
check_command_line_arguments(…)
. -
If so, write the compressed file using
compress_file(…)
. -
If there were any errors, print them to
stderr
usingfprintf(…)
and returnEXIT_FAILURE
. -
Otherwise, print nothing and return
EXIT_SUCCESS
.
-
Check if arguments are correct using
Updates
There are no updates to HW18 so far.