Advanced C Programming

Spring 2022 ECE 264 :: Purdue University

⚠ This is a PAST SEMESTER (Spring 2022).
Due 4/13

Huffman 2: files, frequencies

Learning goals

You should learn how to:

  1. Open, read, and close files.
  2. Pass/return statically declared arrays to/from functions.
  3. Use fixed-size integer types.
  4. Implement a step toward the Huffman coding algorithms

Overview

In this exercise, you will create another part of the Huffman coding (data compression) code that you are developing. For this part, you will create a function that reads a file and calculates the frequencies of every character in the file.

Read the explanation of huffman coding. You will not need to understand the whole process to complete this homework, but it will help you understand how this homework and the last one fit into the larger algorithm.

Getting Started

Get the starter file (frequencies.h). There isn't much in the file. You will be filling in the rest.

Add two type aliases to the header file (frequencies.h).

typedef unsigned char uchar;
typedef uint64_t Frequencies[256];

Although we could have put these in the header file for you, we wanted you to type them out to make sure you give them some thought.

In general, the format for typedef is:
typedef something that looks like a variable declaration The name of the variable in that something that looks like a variable declaration becomes a new type that you can use to declare new variables or parameters.

typedef unsigned char uchar makes a type called uchar that is an unsigned char. It just saves you from having to type unsigned char a lot.

typedef uint64_t Frequencies makes a type called Frequencies that is an array of 256 integers. Each integer is of type uint64_t which is a fixed-size integer type. Unlike int and unsigned int, which can vary in size between platforms, uint64_t is guaranteed to be 64 bits (8 bytes) on any platform where it is used. Using uint64_t requires #include <stdint.h> so be sure to put that at the top of your frequencies.h. When you use the Frequencies type, it is the same as an array of uint64_t[256]. It saves you a bit of typing. More importantly, it expresses exactly what you are using the data for.

You have been using typedef for the past few weeks in linked lists and BSTs. Hopefully, none of this will be too surprising.

Now, for the main part of this homework, you will be writing a single function: calc_frequencies(…). It will read all of the characters in a file and calculate the frequency of every character.

The parameter Frequencies freqs will be an array of 256 integer values. Upon success, freques[ch] will be set to the number of occurrences of character ch in the file path.

Keep in mind that a file is just a sequence of bytes. A given byte can have 256 possible values, from 0 to 255. If the file is plain ASCII text, then most of the values will be in the range of 32 (space) to 126 ('~'). A few will be 10 (\n) or 9 (\t). Other kinds of files may contain any of the values from 0 to 255. For this homework, we make no assumptions. We must use unsigned char because a char assumes a range of -128 to 127, which cannot accommodate values ≥128.

Your calc_frequencies(…) will open the file and read every character. In case the file does not exist, or some other error occurs when opening the file, it will return false and pass an error message back to the caller through a parameter. Otherwise, it will return true and pass the Frequencies back via a parameter.

When fopen(…) fails for any reason, it will return NULL. To get a human-readable error message, call strerror(errno). The errno global variable is an integer value that indicates the type of error most recently encountered. To access errno, you must #include <errno.h>. The strerror(…) function returns a string message for a given errno value. You can assume the string is on the data segment. You do not need to free it. To access strerror(…), you must #include <string.h>.

Tip: Remember that when creating a text file in most text editors, a newline ('\n') will be added to the last line of any non-empty file.

Tip: To see a more precise representation of the contents of any file, call xxd filename from bash.

Tip: To initialize an array to all 0, use something like this. (This is an illustration. Do not copy this verbatim. It won't help you.)

int array_of_zeros[500] = {0};

You can also initialize an array of integers with all zeros, except for a few select values.

int array_of_mostly_zeros[10] = { [2]=100, [6]=100, [4]=100 };

The above example is equivalent to int array_of_mostly_zeros[10] = { 0, 0, 100, 0, 100, 0, 100, 0, 0, 0 };

Tip: Consider making a helper function like _print_frequiencies(Frequencies freqs) in your test_frequencies.c that prints the frequencies in a nicely formatted manner. This may help you troubleshoot in case of problems.

Tip: C functions cannot return a static array. For example, a function cannot return uint64_t[256]. Likewise, no function can return Frequencies because that type is equivalent to uint64_t[256].

How much work is this?

In the instructor's solution, the body of the calc_frequencies(…) comprises only 11 sloc (source lines of code).

Requirements

  1. Your submission must contain each of the following files, as specified:
    file contents
    frequencies.h type definitions
    typedef unsigned char uchar;
    typedef uint64 t Frequencies[256];
    function declaration
    calc frequencies
    Function declaration. This is provided for you.
    frequencies.c functions
    calc frequencies(Frequencies freqs, const char✶ path, const char✶✶ a error)
    return type: bool
    Open a file at path and either store the character frequencies in freq or set *a_error to strerror(errno).
    1. This is the main part of this homework.
    2. Caller is responsible for initializing freqs[ch] to 0 for all ch in 0…255. Thus, calc_frequencies(…) may assume that all freqs[▒]==0.
    3. If the file is opened correctly, then set freqs[ch] to n, where n is the number of occurrences of character ch in the file at path. Then, return true. Do not modify a_error.
    4. If the file could not be opened (i.e., fopen(…) returned NULL, set *a_error to strerror(errno) and return false. Do not modify freqs.
    5. You only need to check for errors related to failure to open the file (fopen(…)).
    6. This should not print anything under any circumstances.
    test_frequencies.c functions
    main(int argc, char✶ argv[])
    return type: int
    1. 100% code coverage is required.
    2. Use your miniunit.h.
    3. You must test at least one error condition (e.g., attempting to open a file that does not exist).
    4. You do not need to check for other kinds of errors.
  2. Do not call malloc(…) or free(…) anywhere in this homework. You don't need them.
  3. Submissions must meet the code quality standards and the course policies on homework and academic integrity.

Submit

To submit HW16 from within your hw16 directory, type

make submit

That should result in the following command: 264submit HW16 frequencies.h frequencies.c test_frequencies.c Makefile miniunit.h clog.h

You may include other files, but they must end with ".txt" to avoid conflicting with our files.

Pre-tester

The pre-tester for HW16 has been released and is ready to use.

Q&A

Updates

Updates may be added here.