Due 11/17

Huffman #2 (E)

Getting Started

Overview

Requirements

This exercise will familiarize you with linked lists, which you will need for a subsequent programming assignment.

Getting Started

Start by getting the files. Type 264get hw13 and then cd hw13 from bash.
You will get the following files:

huffman.h: An empty header file, you have to define your own functions in this homework.
huffman.c: An empty c file, you have to define your own functions in this homework.
huffman_main.c: You will use self defined functions to implement the huffman coding in this file. See requirements for detail
examples: A folder with examples of input and output file.

Overview

This exercise will familiarize you with linked lists and trees, which would be useful for HW14. Specifically, you will learn

To create and manipulate linked lists and trees.
To learn to free the memory associated with a linked list and a tree.

Please read the explanation of huffman coding. This assignment and HW14 deals with Huffman coding, and in particular, it deals with the construction of a Huffman coding tree. Therefore, please read the section on "Huffman Coding" before you proceed with the remainder of this homework description.
READ THE huffman coding BEFORE YOU PROCEED.

For this assignment, you are given the flexibility of designing your functions in huffman.h and huffman.c. You have to write a program in huffman_main.c that takes in an input file, and produces three output files:

./huffman input_file output_file_1 output_file_2 output_file_3

The input file (input_file) can be any files of size up to 2^(63)-1 (LONG_MAX) characters. Some small sample input files are given in the folder examples: gophers, lorum, stone, woods. The three output files (output_file_1, output_file_2, and output_file_3) are as follows:

file	contents
`output_file_1`	A file with 256 lines as output. Line i, 0 ≤ i < 256, should be the number of occurrences of ASCII character i in the input file. The number of occurences in each line should be printed with the format string "%ld\n". The corresponding output files for the input files are gophers.count, lorum.count, stone.count, and woods.count. These files are in the examples folder.
`output_file_2`	A file with only 1 line if the newline character is not present in the input file. Otherwise, there should be 2 lines in this file. This is a file that shows only characters that have appeared in the input file. Moreover, the characters will appear in ascending order with respect to the numbers of occurrences. If two characters have the same number of occurrences, the character with the higher ASCII value should appear later. The format of the file is shown as a linked list: character_1:count_1→character_2:count_2→...→character_k:count_k→NULL where character_1 through character_k are ASCII characters that appear in the input file. count_1 ≤ count_2 ≤ ... ≤ count_k. If count_i is the same as count_i+1, character_i < character_i+1. There is a newline after "NULL". The character_i:count_i pair is printed using the format string "%c:%ld". The easiest way to get this output file is to enqueue the characters (together with their counts) into a priority queue, and then print the list (see HW12). You should design and define a structure that would allow you to enqueue a character and its count at the same time into a priority queue. Take into consideration what you have to do for the third output file before you decide on your user-defined structure. The corresponding output files for the input files in the examples folder are gophers.sorted, lorum.sorted, stone.sorted, and woods.sorted.
`output_file_3`	The file contains as many lines as the number of distinct characters that appear in the input file. Each distinct character should have a line in this output file with the format of character:stream_of_0_and_1 where the stream of 0 and 1 (as character '0' and character '1') being the binary pattern corresponding to the Huffman code of the character. The following output files for the input files in the examples folder are gophers.huffman, lorum.huffman, stone.huffman, and woods.huffman. To produce this output file, it is essential that you build a Huffman coding tree based on the description in the section on "Huffman Coding" in the huffman coding. The construction of a Huffman coding tree requires you to maintaiin a priority queue of partially constructed Huffman coding trees. In these coding trees, all leaf nodes are ASCII characters with their respective counts in the input file. Therefore, we suggest that you define a structure that could be used for the generation of both the second output file and the third output file. In particular, a tree structure with fields in each node to store an ASCII character and its count would be useful. While a leaf node will store an ASCII character, you will have to decide what should be stored in the same field for non-leaf node. The count field of a node should store the total of the counts in the leaf nodes below the node. As mentioned in the "Huffman Coding" section, the priority is defined based on the counts. If two trees have the same total count (of their respective leaf nodes), a tree that corresponds to a single leaf node should have a higher priority. If both trees are leaf nodes themselves, the ASCII values should determine the priority. Otherwise, the order in which the trees are created should determine the priority. When two trees are dequeued from the priority queue, the first node dequeued should be the left branch of the newly constructed coding tree, and the second node dequeued should be the right branch of the newly constructed coding tree. You have a Huffman coding tree when there is only tree in the entire list. The order in which you print the characters and their codes in the third output file is determined by a tree traversal that visits the left branch followed by the right branch. In the tree traversal process, a left branch correspond to a '0' and a right branch corresponds to a '1'. (In reality,it is a 0-bit and a 1-bit, but for the purpose of this exercise, we use '0' character and '1' character). You print the character and its code when you reach a leaf node in the tree traversal.

Requirements

Your submission must contain each of the following files, as specified:

file	contents
huffman.h	functions	You are given flexibility to define your own functions.
huffman.c	functions	You are given flexibility to define your own functions.
huffman_main.c	functions	`main(int argc, char✶ argv[])` → return type: int Use self defined functions to implement huffman coding in this file. The executable of this file takes one input_file and produces three output files. We expect 4 arguments, these arguments should be entered in the following order. `input_file` `output_file_1` `output_file_2` `output_file_3` The content of `output_file_1, output_file_2, output_file_3` should follow the format described above. If executable is not supplied with exactly 4 arguments, the function returns EXIT_FAILURE.

⚠ You will not receive any credit if we fail to obtain "huffman".
⚠ You will not receive any credit if we fail to obtain output_file_1, output_file_2, and output_file_3.
⚠ You will not receive any credit if the content of output_file_1, output_file_2, and output_file3 do not follow the format as specified in homework description.
⚠ Do not print error messages or anything else on stdout (i.e., with printf(…)), except as directed by the instructions.
Submissions must meet the code quality standards and the course policies on homework and academic integrity.

Submit

To submit HW13, type 264submit HW13 huffman.h huffman.c huffman_main.c from inside your hw13 directory.

In general, to submit any assignment for this course, you will use the following command:

264submit ASSIGNMENT FILES…

Submit often and early, even well before you are finished. Doing so creates a backup that you can retrieve in case of a problem (e.g., accidentally deleted your files).

To retrieve your most recent submission, type 264get --restore ASSIGNMENT (e.g., 264get --restore hw13).

To retrieve an earlier submission, first type 264get --list ASSIGNMENT to view your past submissions and find the timestamp of the one you want to retrieve. Then, type 264get --restore -t TIMESTAMP ASSIGNMENT.

Scores will be posted to the Scores page after the deadline for each assignment.

Pre-tester ●

The pre-tester for HW13 has been released and is ready to use.

Using the pretester

The pretester is a tool for checking your work after you believe you are done, and before we have scored it. It is not a substitute for your own checking, but it may help you avoid big surprises by letting you know if your checking was not adequate. To use the pre-tester, first submit your code. Then, type the following command. (Do this only after you have submitted, and only after you believe your submission is perfect.)

264test hw13

Do not ask TAs or instructors which tests you failed.

Keep in mind:

Pre-testing is intended only for those who believe they are done and believe their submission is perfect.
The pre-tester is not part of the requirements of this or any other assignment.
You are responsible for reading the assignment carefully, and ensuring that your code meets all requirements.
Feedback is limited, to ensure that everyone learns to test their own code.
If your code is failing some tests, review your tests and make sure they are comprehensive enough to catch any bugs (deviations from requirements). Follow the tips given by the pre-tester.
Code quality issues are not reported by the pre-tester; writing clean code is something you must learn to do from the start, not a clean-up step to do at the end.

Logistics:

If we discover that we have not checked some significant part of the assignment requirements, we may add additional tests at any time up to the point when scores are released.
The pre-tester will only be enabled after much of the class has submitted the assignment, and at least a few people have submitted perfect submissions. This is to allow us to test the pre-tester.
The pre-tester checks your most recent submission. You must submit first.
You may be limited to running the pre-tester ≤24 times in a 24-hour period. (This is not implemented yet but will be added.)

Q&A

Updates

11/14/2017	Preview posted

Advanced C Programming

Fall 2017 :: ECE 264 :: Purdue University

Huffman #2 (E)

Getting Started

Overview

Requirements

Submit

Pre-tester ●

Q&A

Updates