Advanced C Programming

Fall 2017 :: ECE 264 :: Purdue University

Due 11/17

Huffman #2 (E)

This exercise will familiarize you with linked lists, which you will need for a subsequent programming assignment.

Getting Started

Start by getting the files. Type 264get hw13 and then cd hw13 from bash.
You will get the following files:

  1. huffman.h: An empty header file, you have to define your own functions in this homework.
  2. huffman.c: An empty c file, you have to define your own functions in this homework.
  3. huffman_main.c: You will use self defined functions to implement the huffman coding in this file. See requirements for detail
  4. examples: A folder with examples of input and output file.

Overview

This exercise will familiarize you with linked lists and trees, which would be useful for HW14. Specifically, you will learn

  1. To create and manipulate linked lists and trees.
  2. To learn to free the memory associated with a linked list and a tree.

Please read the explanation of huffman coding. This assignment and HW14 deals with Huffman coding, and in particular, it deals with the construction of a Huffman coding tree. Therefore, please read the section on "Huffman Coding" before you proceed with the remainder of this homework description.
READ THE huffman coding BEFORE YOU PROCEED.

For this assignment, you are given the flexibility of designing your functions in huffman.h and huffman.c. You have to write a program in huffman_main.c that takes in an input file, and produces three output files:

./huffman input_file output_file_1 output_file_2 output_file_3

The input file (input_file) can be any files of size up to 2^(63)-1 (LONG_MAX) characters. Some small sample input files are given in the folder examples: gophers, lorum, stone, woods. The three output files (output_file_1, output_file_2, and output_file_3) are as follows:

file contents
output_file_1
  1. A file with 256 lines as output.
  2. Line i, 0 ≤ i < 256, should be the number of occurrences of ASCII character i in the input file.
  3. The number of occurences in each line should be printed with the format string "%ld\n".
  4. The corresponding output files for the input files are gophers.count, lorum.count, stone.count, and woods.count. These files are in the examples folder.
output_file_2
  1. A file with only 1 line if the newline character is not present in the input file. Otherwise, there should be 2 lines in this file.
  2. This is a file that shows only characters that have appeared in the input file. Moreover, the characters will appear in ascending order with respect to the numbers of occurrences. If two characters have the same number of occurrences, the character with the higher ASCII value should appear later.
  3. The format of the file is shown as a linked list:
              character_1:count_1→character_2:count_2→...→character_k:count_k→NULL
              
    where character_1 through character_k are ASCII characters that appear in the input file. count_1 ≤ count_2 ≤ ... ≤ count_k. If count_i is the same as count_i+1, character_i < character_i+1. There is a newline after "NULL". The character_i:count_i pair is printed using the format string "%c:%ld".
  4. The easiest way to get this output file is to enqueue the characters (together with their counts) into a priority queue, and then print the list (see HW12). You should design and define a structure that would allow you to enqueue a character and its count at the same time into a priority queue. Take into consideration what you have to do for the third output file before you decide on your user-defined structure.
  5. The corresponding output files for the input files in the examples folder are gophers.sorted, lorum.sorted, stone.sorted, and woods.sorted.
output_file_3
  1. The file contains as many lines as the number of distinct characters that appear in the input file.
  2. Each distinct character should have a line in this output file with the format of character:stream_of_0_and_1 where the stream of 0 and 1 (as character '0' and character '1') being the binary pattern corresponding to the Huffman code of the character.
  3. The following output files for the input files in the examples folder are gophers.huffman, lorum.huffman, stone.huffman, and woods.huffman.
  4. To produce this output file, it is essential that you build a Huffman coding tree based on the description in the section on "Huffman Coding" in the huffman coding.
  5. The construction of a Huffman coding tree requires you to maintaiin a priority queue of partially constructed Huffman coding trees. In these coding trees, all leaf nodes are ASCII characters with their respective counts in the input file. Therefore, we suggest that you define a structure that could be used for the generation of both the second output file and the third output file. In particular, a tree structure with fields in each node to store an ASCII character and its count would be useful.
  6. While a leaf node will store an ASCII character, you will have to decide what should be stored in the same field for non-leaf node. The count field of a node should store the total of the counts in the leaf nodes below the node. As mentioned in the "Huffman Coding" section, the priority is defined based on the counts. If two trees have the same total count (of their respective leaf nodes), a tree that corresponds to a single leaf node should have a higher priority.
  7. If both trees are leaf nodes themselves, the ASCII values should determine the priority. Otherwise, the order in which the trees are created should determine the priority. When two trees are dequeued from the priority queue, the first node dequeued should be the left branch of the newly constructed coding tree, and the second node dequeued should be the right branch of the newly constructed coding tree.
  8. You have a Huffman coding tree when there is only tree in the entire list. The order in which you print the characters and their codes in the third output file is determined by a tree traversal that visits the left branch followed by the right branch.
  9. In the tree traversal process, a left branch correspond to a '0' and a right branch corresponds to a '1'. (In reality,it is a 0-bit and a 1-bit, but for the purpose of this exercise, we use '0' character and '1' character). You print the character and its code when you reach a leaf node in the tree traversal.

Requirements

  1. Your submission must contain each of the following files, as specified:
    file contents
    huffman.h functions You are given flexibility to define your own functions.
    huffman.c functions You are given flexibility to define your own functions.
    huffman_main.c functions
    main(int argc, char✶ argv[])
    return type: int
    1. Use self defined functions to implement huffman coding in this file.
    2. The executable of this file takes one input_file and produces three output files. We expect 4 arguments, these arguments should be entered in the following order.
      1. input_file
      2. output_file_1
      3. output_file_2
      4. output_file_3
    3. The content of output_file_1, output_file_2, output_file_3 should follow the format described above.
    4. If executable is not supplied with exactly 4 arguments, the function returns EXIT_FAILURE.
  2. You will not receive any credit if we fail to obtain "huffman".
  3. You will not receive any credit if we fail to obtain output_file_1, output_file_2, and output_file_3.
  4. You will not receive any credit if the content of output_file_1, output_file_2, and output_file3 do not follow the format as specified in homework description.
  5. Do not print error messages or anything else on stdout (i.e., with printf(…)), except as directed by the instructions.
  6. Submissions must meet the code quality standards and the course policies on homework and academic integrity.

Submit

To submit HW13, type 264submit HW13 huffman.h huffman.c huffman_main.c from inside your hw13 directory.

Pre-tester

The pre-tester for HW13 has been released and is ready to use.

Q&A

Updates

11/14/2017 Preview posted