Huffman 1: priority queue
This homework is the first in a 3-part series about data compression using Huffman coding. For this first step, you will use linked lists to implement a priority queue data structure and a stack data structure. Those are just linked lists with some rules about how elements are added and removed.
Learning goals
You should learn or practice how to:
- Create and manipulate linked lists.
- Avoid memory faults associated with linked lists.
- Use function addresses to make your data structures work with any type of values.
Getting Started
Get the starter code
Start by getting the files. Type 264get hw14
and then
cd hw14
from bash. This will
give you huffman.h (header file) and huffman.c skeleton implementation file.
We have also added in a demostration file, demo_huffman.c, which simply illustrates how the pieces fit together. Do not copy that code. However, you may test in your account, if you wish. (demo_huffman.c was added to the starter files the morning of Mon 3/30/2020.)
Copy in your Makefile, miniunit.h, and clog.h from a previous homework.
cp ../▒▒▒/{Makefile,miniunit.h,clog.h} .
Update your Makefile
The variables at the top of your file should contain the following:ASG_NICKNAME | HW14 |
SRC_C | huffman.c |
SRC_H | huffman.h miniunit.h clog.h |
TEST_C | test_huffman.c |
EXECUTABLE | huffman |
Adjust your submit
rule to ensure than whenever you run
make submit
, the following command is executed:
264submit hw14 huffman.c huffman.h test_huffman.c miniunit.h clog.h Makefile
264submit $(ASG_NICKNAME) $(SRC_C) $(SRC_H) $(TEST_C) Makefile
))
Double-check your $(EXECUTABLE)
rule. Running
make
or make huffman
rule should result
in the following command (but using your variables wherever possible):
gcc -o huffman -g -std=c11 -Wall -Wshadow -Wvla -Werror -pedantic huffman.c test_huffman.c
Overview
In this exercise, you will use a linked list to implement a priority queue or a stack.
A priority queue is implemented as a linked list where the objects stored in the list are actually ordered. For example, if we want to store a linked list of numbers 5, 10, 4, 3, 1, and the linked list is used to implement a priority queue, the linked list should appear as follows:
1→3→4→5→10→NULL
Here, we assume that "→" is the "pointer." We also assume that a smaller number has a higher priority, and it is therefore listed first. If we now want to get number 7 to join the priority queue, we should have the list updated to:
1→3→4→5→7→10→NULL
We call this the enqueue operation. To remove an item from a priority queue, we always remove the item with the highest priority. This is call the dequeue operation. In this case, a dequeue operation will remove 1 from the list:
3→4→5→7→10→NULL
When a stack is implemented as a linked list, the object that is pushed onto the stack always appears as the first item on the list. Therefore, if we push the list of numbers 5, 10, 4, 3, 1 onto a stack successively, we have the following linked list:
1→3→4→10→5→NULL
If we pop from the stack, we have
3→4→10→5→NULLIf we now push 7 onto the stack, we have
7→3→4→10→5→NULLIn both priority queue and stack, we always remove an item from the beginning of the list because that is the most convenient place to remove an item from a list.
You will also write a function to destroy a linked list and free all memory associated with it.
Generic data types
In previous assignments with linked lists, you always specified the type of data
that would be stored. In this homework, the Node
will contain a
void*
(memory address to any type). This allows you to use the same
code for linked lists of any type.
This is simlar to the way that
qsort(…)
allows you to sort arrays of any type.
Function addresses
Code that deals with generic data types often needs to pass functions as parameters. However, the declarations are UGLY.
We will start with a couple examples.
Example #1: Declare local variable of type “address to function taking parameter types and returning return type”
Let's start with an example. In the snippet below, we have two functions,
_print_square(…)
and _print_cube(…)
.
Instead of calling them directly, we will assign the address of each function to
a local variable and then call them by way of that local variable, just to demonstrate
the syntax.
#include <stdio.h>
#include <stdlib.h>
void _print_square(int n) {
printf("%d² is %d\n", n, n*n);
}
void _print_cube(int n) {
printf("%d³ is %d\n", n, n*n*n);
}
int main(int argc, char* argv[]) {
// Declare variable «fn» of type "address to function taking an int and returning void"
void (*fn)(int);
// Initialize fn to the address of _print_square(…). Normally we delcare and initialize
// on the same line, but we have separated the two for the clarity of this explanation.
fn = _print_square;
// Call _print_square(…) by way of variable «fn».
fn(7);
// Now set «fn» to the address of _print_cube.
fn = _print_cube;
// Call _print_cube(…) by way of variable «fn».
fn(7);
return EXIT_SUCCESS;
}
/* vim: set tabstop=4 shiftwidth=4 fileencoding=utf-8 noexpandtab: */
The type of variable fn
is void(*)(int)
.
In words, we will say, “«fn» is the address to a function taking an int and returning void.”
Example #2: Declare parameter of type “address to function taking parameter types and returning return type”
Instead of creating a local variable, we can also use that same type
(void(*fn)(int)
) as a parameter to
a function. In the example below, we call a function
_call_print_fn(…)
which then calls
another function, specified by the second argument.
#include <stdio.h>
#include <stdlib.h>
void _print_square(int n) {
printf("%d² is %d\n", n, n*n);
}
void _print_cube(int n) {
printf("%d³ is %d\n", n, n*n*n);
}
void _call_print_fn(int n, void (*print_fn)(int)) {
print_fn(n);
}
int main(int argc, char* argv[]) {
// Call _print_square(…) by way of the helper function.
_call_print_fn(7, _print_square);
// Call _print_cube(…) by way of the helper function.
_call_print_fn(7, _print_cube);
return EXIT_SUCCESS;
}
/* vim: set tabstop=4 shiftwidth=4 fileencoding=utf-8 noexpandtab: */
Like the previous example, the type of parameter print_fn
is void(*)(int)
.
In words, we will say, “«print_fn» is the address to a function taking an int and returning void.”
Basic form for function address variables
The basic form for declaring a variable called Nameof type “address to function taking Parameter Types and returning Return Type” is:
Return Type(*Name)(Parameter Types)
Yes, it's UGLY. The name of that variable is in the middle!!!
⚠ Do not use parentheses when passing a function address
or assigning it to a variable. In the code above,
✘_call_print_fn(7, _print_cube(…))
will not work.
⚠ Passing a function address is totally different from invoking your
mu_run(…)
macro with a function (e.g., mu_run(_test_empty_list)
).
Although mu_run(…)
looks like a function, it is really just transorming
your code so that function can be called elsewhere and/or its name can be printed.
In contrast, passing a function address is passing an actual address. It is a
number that can be printed using printf(…)
, if you wish.
Use static
in definitions of helper functions
Starting with HW14, all helper functions must be declared as static in any assignment that involves compiling ≥2 files together.
void _know_your_worth() { ▒▒▒▒▒▒ } | ⇒ | static void _know_your_worth() { ▒▒▒▒▒▒ } |
That tells the compiler the function should only be available from the current file. Code in other files won't be able to call it.
Requirements
- Your submission must contain each of the following files, as specified:
file contents huffman.c functions pq enqueue(Node✶✶ a head, void✶ a value, int(✶cmp fn)(const void✶, const void✶)))
→ return type: Node✶Add a new node with valuea_value
to a priority queue, using functioncmp_fn(…)
to determine the ordering of the priority queue.- *a_head refers to the head of the linked list.
- If *a_head is NULL, the list is empty.
- a_value is the address of whatever value is associated with this node.
- cmp_fn(…) is a comparison function with the same semantics
as the comparison function for
qsort(…)
. - The newly allocated Node should be inserted into the linked
list such that all Nodes before the new Node in
the list contain objects that are smaller (or equal) to the new
Node
(according to
cmp_fn(…)
). All Node's after the new Node in the list contain objects that are bigger (or equal) to the new Node (according tocmp_fn(…)
). - ⚠*a_head should be updated if the new Node becomes the first item in the list. The function should return the address of the new Node.
- Hint: You will call
malloc(…)
exactly once. You will not callfree(…)
in this function.
pq dequeue(Node✶✶ a head)
→ return type: Node✶Detach and return the head.- *a_head refers to the head (first node) of a valid linked list.
- If the list is empty, return NULL (since there is nothing to dequeue).
- Upon return,
*a_head
must be a valid linked list (possibly empty). - Reminder:
NULL
is a valid linked list of size 0. Thus:*a_head
will be NULL if the list is empty.- Upon removing the last node, you should set
*a_head
to NULL.
- You must set the next field of the removed Node to NULL.
- The caller is responsible for freeing the detached node, and any memory
it refers to.
- ⚠ This function should not call
free(…)
, directly or indirectly.
- ⚠ This function should not call
- You are effectively splitting a linked list into two lists.
- Hint: You should not call
malloc(…)
orfree(…)
in this function.
stack push(Node✶✶ a top, void✶ a value)
→ return type: Node✶- *a_top stores the address of the first Node in the linked list.
- a_value stores the address of the generic type.
- The a_value field of the newly allocated Node should store a_value.
- The newly allocated Node should become the first Node of the list, and *a_top should be updated. The function returns the address of the new Node.
- Hint: You will call
malloc(…)
exactly once. You will not callfree(…)
in this function.
stack pop(Node✶✶ stack))
→ return type: Node✶Detach and return the node from the head of the linked list.- This is very similar to
pq_dequeue(…)
.
destroy list(Node✶✶ a head, void (✶destroy fn)(void✶))
→ return type: void- *a_head stores the address of the first Node in the list.
- destroy_fn(…) is a function that deallocates *a_value, as needed.
- Set the head to NULL in the caller's stack frame (i.e.,
*a_head = NULL;
).
test_huffman.c functions main(int argc, char✶ argv[])
→ return type: intTest all of the required functions in your huffman.c.- 100% code coverage is required.
- Use miniunit.h.
- Note: The names of parameters in the function definition do not need to match the declaration. Don't worry if you see a parameter listed as
a_head
in one place anda_top
in another. - All helper functions should be declared as
static
. - ⚠ Do not modify huffman.h.
- Submissions must meet the code quality standards and the course policies on homework and academic integrity.
Examples of cmp_fn(…)
We have provided examples of destroy_fn(…)
functions.
Here are some examples of cmp_fn(…)
(for the pq_enqueue(…)
function) and
print_fn(…)
.
If we store the addresses of int (i.e., int*) as the addresses of the generic
type (void*) in .a_value, we can use the following comparison function:
static int int_cmp(const void* p1, const void* p2) { return *(const int*)p1 - *(const int*)p2; }Here, the addresses p1 and p2 have been typecast to const int*, and dereferenced to obtain const int for comparison (with the difference being the result). If we store the addresses of char (i.e., char*) as the addresses of the generic type (void*) in .a_value, and each address of char is actually the address of the first character of string, we can use the following comparison function:
static int _string_cmp(const void* p1, const void* p2) { return strcmp((const char*)p1, (const char*)p2); }
Submit
To submit HW14 from within your hw14 directory, type
make submit
That should result in the following command:
264submit HW14 huffman.c test_huffman.c miniunit.h clog.h Makefile huffman.h
Pre-tester ●
The pre-tester for HW14 has been released and is ready to use.
Q&A
How much work is this?
The instructor's solution (shown below) occupies 42 sloc (source lines of code, excluding comments and blank lines). We sometimes use the size of the "shortest defensible solution" as a rough (and certainly imperfect) measure of difficulty.
👌 Okay to copy/adapt anything you find in the screenshot above (in this Q&A item). Granted there is almost nothing to copy, but you may find the comments helpful to steer you away from needlessly complicated approaches. There are many other completely correct ways to solve this.What is the hardest part?
Thepq_enqueue(…)
function is the most difficult.Can I make this easier?
Yes! Use TDD to test and build in lockstep. Write a test. Then, write just enough implementation code to pass that test (and any prior tests).Do not try to code it all at once. Start by getting it working for the simplest possible test. This is probably adding a single node (i.e., adding a node to an empty list). Test that it works before you move on. Next, get it working and tested for a list of size two, where the items are assumed to be added in descending order (so that no reordering is necessary). Get it working before you move on.Once you get into the longer test cases that require placing the new node in the right position in the list, you should draw pictures. Walk through how the connections need to change on paper before attempting to do it in code.How do the compare and destroy functions fit together with the rest of this code?
Here's a code example.⚠ DO NOT COPY CODE FROM THIS FILE INTO YOUR OWN CODE. We are providing this only to clarify how the pieces are supposed to fit together. It would not be a good basis for your test_huffman.c. However, you may run this code in your account. It has been added to the starter code.#include <stdio.h> #include <stdlib.h> #include <string.h> #include "huffman.h" static int _cmp_strings_by_length(const void* a_lhs, const void* a_rhs) { size_t len_lhs = strlen(a_lhs); // No typecast needed! Never use typedcast unless: size_t len_rhs = strlen(a_rhs); // (a) necessary, (b) safe, and (c) well-understood. return len_lhs - len_rhs; // shortest string comes forst } static void _destroy_string(void* string) { free(string); } static char* _copy_string(const char* src_string) { size_t num_chars = strlen(src_string); char* dst_string = malloc(sizeof(*dst_string) * (num_chars + 1)); return strcpy(dst_string, src_string); } int main(int argc, char* argv[]) { // Create a priority queue of four strings, ordered by length (ascending). Node* head = NULL; // size 0 pq_enqueue(&head, _copy_string("Reginald"), _cmp_strings_by_length); // size 1 pq_enqueue(&head, _copy_string("was"), _cmp_strings_by_length); // size 2 pq_enqueue(&head, _copy_string("a"), _cmp_strings_by_length); // size 3 pq_enqueue(&head, _copy_string("hatter"), _cmp_strings_by_length); // size 4 // Print contents of the list: a was hatter Reginald for(Node* curr = head; curr != NULL; curr = curr -> next) { printf("%s\n", (char*)(curr -> a_value)); } // Deallocate all memory used by the list, including each node's payload. destroy_list(&head, _destroy_string); return EXIT_SUCCESS; } /* ______ * Output * ‾‾‾‾‾‾ * a * was * hatter * Reginald */ /* vim: set tabstop=4 shiftwidth=4 fileencoding=utf-8 noexpandtab: */
Updates
Updates may be posted here.