Spring 2020 ECE 264 :: Purdue University

⚠ This is a PAST SEMESTER (Spring 2020).
Due 4/2

Huffman 1: priority queue

This homework is the first in a 3-part series about data compression using Huffman coding. For this first step, you will use linked lists to implement a priority queue data structure and a stack data structure. Those are just linked lists with some rules about how elements are added and removed.

Learning goals

You should learn or practice how to:

1. Create and manipulate linked lists.
2. Avoid memory faults associated with linked lists.
3. Use function addresses to make your data structures work with any type of values.

Getting Started

Get the starter code

Start by getting the files. Type 264get hw11 and then cd hw11 from bash. This will give you huffman.h (header file) and huffman.c skeleton implementation file.

We have also added in a demostration file, demo_huffman.c, which simply illustrates how the pieces fit together. Do not copy that code. However, you may test in your account, if you wish. (demo_huffman.c was added to the starter files the morning of Mon 3/30/2020.)

Copy in your Makefile, miniunit.h, and clog.h from a previous homework.

cp ../▒▒▒/{Makefile,miniunit.h,clog.h} .
⚠ Don't forget the period at the end of that command.

The variables at the top of your file should contain the following:
 ASG_NICKNAME HW11 SRC_C huffman.c SRC_H huffman.h miniunit.h clog.h TEST_C test_huffman.c EXECUTABLE huffman

Adjust your submit rule to ensure than whenever you run make submit, the following command is executed:

264submit hw11 huffman.c huffman.h test_huffman.c miniunit.h clog.h Makefile
… but your rule should use the variables wherever possible (i.e., 264submit $(ASG_NICKNAME)$(SRC_C) $(SRC_H)$(TEST_C) Makefile))

Double-check your \$(EXECUTABLE) rule. Running make or make huffman rule should result in the following command (but using your variables wherever possible):

gcc -o huffman -g -std=c11 -Wall -Wshadow -Wvla -Werror -pedantic huffman.c test_huffman.c

Overview

In this exercise, you will use a linked list to implement a priority queue or a stack.

A priority queue is implemented as a linked list where the objects stored in the list are actually ordered. For example, if we want to store a linked list of numbers 5, 10, 4, 3, 1, and the linked list is used to implement a priority queue, the linked list should appear as follows:

1→3→4→5→10→NULL


Here, we assume that "→" is the "pointer." We also assume that a smaller number has a higher priority, and it is therefore listed first. If we now want to get number 7 to join the priority queue, we should have the list updated to:

1→3→4→5→7→10→NULL


We call this the enqueue operation. To remove an item from a priority queue, we always remove the item with the highest priority. This is call the dequeue operation. In this case, a dequeue operation will remove 1 from the list:

3→4→5→7→10→NULL


When a stack is implemented as a linked list, the object that is pushed onto the stack always appears as the first item on the list. Therefore, if we push the list of numbers 5, 10, 4, 3, 1 onto a stack successively, we have the following linked list:

1→3→4→10→5→NULL


If we pop from the stack, we have

3→4→10→5→NULL

If we now push 7 onto the stack, we have
7→3→4→10→5→NULL

In both priority queue and stack, we always remove an item from the beginning of the list because that is the most convenient place to remove an item from a list.

You will also write a function to destroy a linked list and free all memory associated with it.

Generic data types

In previous assignments with linked lists, you always specified the type of data that would be stored. In this homework, the Node will contain a void* (memory address to any type). This allows you to use the same code for linked lists of any type.

This is simlar to the way that qsort(…) allows you to sort arrays of any type.

Code that deals with generic data types often needs to pass functions as parameters. However, the declarations are UGLY.

Example #1: Declare local variable of type “address to function taking parameter types and returning return type”

Let's start with an example. In the snippet below, we have two functions, _print_square(…) and _print_cube(…). Instead of calling them directly, we will assign the address of each function to a local variable and then call them by way of that local variable, just to demonstrate the syntax.

#include <stdio.h>
#include <stdlib.h>

void _print_square(int n) {
printf("%d² is %d\n", n, n*n);
}

void _print_cube(int n) {
printf("%d³ is %d\n", n, n*n*n);
}

int main(int argc, char* argv[]) {

// Declare variable «fn» of type "address to function taking an int and returning void"
void (*fn)(int);

// Initialize fn to the address of _print_square(…).  Normally we delcare and initialize
// on the same line, but we have separated the two for the clarity of this explanation.
fn = _print_square;

// Call _print_square(…) by way of variable «fn».
fn(7);

// Now set «fn» to the address of _print_cube.
fn = _print_cube;

// Call _print_cube(…) by way of variable «fn».
fn(7);

return EXIT_SUCCESS;
}
/* vim: set tabstop=4 shiftwidth=4 fileencoding=utf-8 noexpandtab: */


The type of variable fn is void(*)(int). In words, we will say, “«fn» is the address to a function taking an int and returning void.”

Example #2: Declare parameter of type “address to function taking parameter types and returning return type”

Instead of creating a local variable, we can also use that same type (void(*fn)(int)) as a parameter to a function. In the example below, we call a function _call_print_fn(…) which then calls another function, specified by the second argument.

#include <stdio.h>
#include <stdlib.h>

void _print_square(int n) {
printf("%d² is %d\n", n, n*n);
}

void _print_cube(int n) {
printf("%d³ is %d\n", n, n*n*n);
}

void _call_print_fn(int n, void (*print_fn)(int)) {
print_fn(n);
}

int main(int argc, char* argv[]) {

// Call _print_square(…) by way of the helper function.
_call_print_fn(7, _print_square);

// Call _print_cube(…) by way of the helper function.
_call_print_fn(7, _print_cube);

return EXIT_SUCCESS;
}
/* vim: set tabstop=4 shiftwidth=4 fileencoding=utf-8 noexpandtab: */


Like the previous example, the type of parameter print_fn is void(*)(int). In words, we will say, “«print_fn» is the address to a function taking an int and returning void.”

Basic form for function address variables

The basic form for declaring a variable called Nameof type “address to function taking Parameter Types and returning Return Type” is:

Return Type(*Name)(Parameter Types)

Yes, it's UGLY. The name of that variable is in the middle!!!

Do not use parentheses when passing a function address or assigning it to a variable. In the code above, _call_print_fn(7, _print_cube(…)) will not work.

Passing a function address is totally different from invoking your mu_run(…) macro with a function (e.g., mu_run(_test_empty_list)). Although mu_run(…) looks like a function, it is really just transorming your code so that function can be called elsewhere and/or its name can be printed. In contrast, passing a function address is passing an actual address. It is a number that can be printed using printf(…), if you wish.

Use static in definitions of helper functions

Starting with HW11, all helper functions must be declared as static in any assignment that involves compiling ≥2 files together.

 void _know_your_worth() { ▒▒▒▒▒▒ } ⇒ static void _know_your_worth() { ▒▒▒▒▒▒ }

That tells the compiler the function should only be available from the current file. Code in other files won't be able to call it.

Requirements

1. Your submission must contain each of the following files, as specified:
file contents
huffman.c functions
pq enqueue(Node✶✶ a head, void✶ a value, int(✶cmp fn)(const void✶, const void✶)))
return type: Node✶
Add a new node with value a_value to a priority queue, using function cmp_fn(…) to determine the ordering of the priority queue.
2. If *a_head is NULL, the list is empty.
3. a_value is the address of whatever value is associated with this node.
4. cmp_fn(…) is a comparison function with the same semantics as the comparison function for qsort(…).
5. The newly allocated Node should be inserted into the linked list such that all Nodes before the new Node in the list contain objects that are smaller (or equal) to the new Node (according to cmp_fn(…)). All Node's after the new Node in the list contain objects that are bigger (or equal) to the new Node (according to cmp_fn(…)).
6. *a_head should be updated if the new Node becomes the first item in the list. The function should return the address of the new Node.
7. Hint: You will call malloc(…) exactly once. You will not call free(…) in this function.
pq dequeue(Node✶✶ a head)
return type: Node✶
2. If the list is empty, return NULL (since there is nothing to dequeue).
3. Upon return, *a_head must be a valid linked list (possibly empty).
4. Reminder: NULL is a valid linked list of size 0. Thus:
1. *a_head will be NULL if the list is empty.
2. Upon removing the last node, you should set *a_head to NULL.
5. You must set the next field of the removed Node to NULL.
6. The caller is responsible for freeing the detached node, and any memory it refers to.
1. This function should not call free(…), directly or indirectly.
7. You are effectively splitting a linked list into two lists.
8. Hint: You should not call malloc(…) or free(…) in this function.
stack push(Node✶✶ a top, void✶ a value)
return type: Node✶
1. *a_top stores the address of the first Node in the linked list.
2. a_value stores the address of the generic type.
3. The a_value field of the newly allocated Node should store a_value.
4. The newly allocated Node should become the first Node of the list, and *a_top should be updated. The function returns the address of the new Node.
5. Hint: You will call malloc(…) exactly once. You will not call free(…) in this function.
stack pop(Node✶✶ stack))
return type: Node✶
Detach and return the node from the head of the linked list.
1. This is very similar to pq_dequeue(…).
destroy list(Node✶✶ a head, void (✶destroy fn)(void✶))
return type: void
1. *a_head stores the address of the first Node in the list.
2. destroy_fn(…) is a function that deallocates *a_value, as needed.
3. Set the head to NULL in the caller's stack frame (i.e., *a_head = NULL;).
test_huffman.c functions
main(int argc, char✶ argv[])
return type: int
Test all of the required functions in your huffman.c.
• 100% code coverage is required.
• Use miniunit.h.
2. Note: The names of parameters in the function definition do not need to match the declaration. Don't worry if you see a parameter listed as a_head in one place and a_top in another.
3. All helper functions should be declared as static.
4. Do not modify huffman.h.
5. Submissions must meet the code quality standards and the course policies on homework and academic integrity.

Examples of cmp_fn(…)

We have provided examples of destroy_fn(…) functions. Here are some examples of cmp_fn(…) (for the pq_enqueue(…) function) and print_fn(…). If we store the addresses of int (i.e., int*) as the addresses of the generic type (void*) in .a_value, we can use the following comparison function:

static int int_cmp(const void* p1, const void* p2) {
return *(const int*)p1 - *(const int*)p2;
}
Here, the addresses p1 and p2 have been typecast to const int*, and dereferenced to obtain const int for comparison (with the difference being the result). If we store the addresses of char (i.e., char*) as the addresses of the generic type (void*) in .a_value, and each address of char is actually the address of the first character of string, we can use the following comparison function:
static int _string_cmp(const void* p1, const void* p2) {
return strcmp((const char*)p1, (const char*)p2);
}

Submit

To submit HW11 from within your hw11 directory, type

make submit

That should result in the following command: 264submit HW11 huffman.c test_huffman.c miniunit.h clog.h Makefile huffman.h

Pre-tester ●

The pre-tester for HW11 has been released and is ready to use.

Q&A

1. How much work is this?

The instructor's solution (shown below) occupies 42 sloc (source lines of code, excluding comments and blank lines). We sometimes use the size of the "shortest defensible solution" as a rough (and certainly imperfect) measure of difficulty.

👌 Okay to copy/adapt anything you find in the screenshot above (in this Q&A item). Granted there is almost nothing to copy, but you may find the comments helpful to steer you away from needlessly complicated approaches. There are many other completely correct ways to solve this.
2. What is the hardest part?

The pq_enqueue(…) function is the most difficult.
3. Can I make this easier?

Yes! Use TDD to test and build in lockstep. Write a test. Then, write just enough implementation code to pass that test (and any prior tests).
Do not try to code it all at once. Start by getting it working for the simplest possible test. This is probably adding a single node (i.e., adding a node to an empty list). Test that it works before you move on. Next, get it working and tested for a list of size two, where the items are assumed to be added in descending order (so that no reordering is necessary). Get it working before you move on.
Once you get into the longer test cases that require placing the new node in the right position in the list, you should draw pictures. Walk through how the connections need to change on paper before attempting to do it in code.
4. How do the compare and destroy functions fit together with the rest of this code?

Here's a code example.
DO NOT COPY CODE FROM THIS FILE INTO YOUR OWN CODE. We are providing this only to clarify how the pieces are supposed to fit together. It would not be a good basis for your test_huffman.c. However, you may run this code in your account. It has been added to the starter code.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "huffman.h"

static int _cmp_strings_by_length(const void* a_lhs, const void* a_rhs) {
size_t len_lhs = strlen(a_lhs); // No typecast needed!  Never use typedcast unless:
size_t len_rhs = strlen(a_rhs); // (a) necessary, (b) safe, and (c) well-understood.
return len_lhs - len_rhs; // shortest string comes forst
}

static void _destroy_string(void* string) {
free(string);
}

static char* _copy_string(const char* src_string) {
size_t num_chars = strlen(src_string);
char* dst_string = malloc(sizeof(*dst_string) * (num_chars + 1));
return strcpy(dst_string, src_string);
}

int main(int argc, char* argv[]) {
// Create a priority queue of four strings, ordered by length (ascending).
Node* head = NULL;  // size 0
pq_enqueue(&head, _copy_string("Reginald"), _cmp_strings_by_length);  // size 1
pq_enqueue(&head, _copy_string("was"),      _cmp_strings_by_length);  // size 2
pq_enqueue(&head, _copy_string("a"),        _cmp_strings_by_length);  // size 3
pq_enqueue(&head, _copy_string("hatter"),   _cmp_strings_by_length);  // size 4

// Print contents of the list:  a was hatter Reginald
for(Node* curr = head; curr != NULL; curr = curr -> next) {
printf("%s\n", (char*)(curr -> a_value));
}

// Deallocate all memory used by the list, including each node's payload.

return EXIT_SUCCESS;
}

/* ______
* Output
* ‾‾‾‾‾‾
* a
* was
* hatter
* Reginald
*/

/* vim: set tabstop=4 shiftwidth=4 fileencoding=utf-8 noexpandtab: */


3/28/2020: Relative to the “preview” version of this homework: ① Two functions (remove_last_node(…) and print_node(…)) were removed. ② pq_enqueue(…) was simplified. Previously, it had some odd semantics regarding NULL. ③ expected.txt was removed. ④ Several parameters have been renamed to consistent with the conventions we are following this semester. ⑤ Your helper functions must now be static. ⚠ Be on the lookout for any inconsistencies that might be lingering.
3/29/2020: Added some code snippets and a redacted screenshot in response to some questions that came up on Piazza. We also added more explanation about using static with helper functions.
3/30/2020: Fixed typo in pq_enqueue(…) description (“*a_head should be updated…”).
4/2/2020: Fixed minor typo in mini-Q&A about static: “Only files in the same functions defined in the same file will be able to ‘see’ it.”