Sorting 2: generic data structures
This homework is the second part in a 2 part series about dynamic data structures. In HW12, you created a priority queue linked list and a binary search tree, and used both to sort an array of integers. This homework will reuse those concepts, creating generic data structures that can store any type of data.
Learning goals
You should learn or practice how to:
- Create and manipulate data structures using void addresses.
- Use function addresses to make your data structures work with any type of values.
- Gain practice generalizing existing code for a later task.
Getting Started
Get the starter code
Start by getting the files. Type 264get hw14
and then
cd hw14
from bash. This will
give you warmup.c. This warmup is not far from what we did in lecture. By completing it,
you can catch some pitfalls from using function addresses early, and will write some code useful
to test addresses.c.
Copy in your Makefile, miniunit.h, and clog.h from a previous homework.
cp ../▒▒▒/{Makefile,miniunit.h,clog.h} .
Update your Makefile
The variables at the top of your file should contain the following:ASG_NICKNAME | HW14 |
SRC_C | addresses.c |
SRC_H | addresses.h miniunit.h clog.h |
TEST_C | test_addresses.c |
EXECUTABLE | test_addresses |
Adjust your submit
rule to ensure you are submitting warmup.c, ideally you will only have to change variables at the top. When you run make submit
, something similar to the the following command should be executed:
264submit hw14 addresses.c addresses.h test_addresses.c miniunit.h clog.h Makefile warmup.c
264submit $(ASG_NICKNAME) $(SUBMIT_FILES)
)
Double-check your $(EXECUTABLE)
rule. Running
make
or make test_addresses
rule should result
in the following command (but using your variables wherever possible):
gcc -o test_addresses -g -std=c11 -Wall -Wshadow -Wvla -Werror -pedantic addresses.c test_addresses.cIf you wish, you could add a
make warmup
rule.
Overview
In this exercise, you will update your linked list from HW12 to support any type of data and implement a simple stack. You will also take advanage of function addresses to allow creating sorted data structures such as a priority queue and a binary search tree.
It is recommended you read the full overview before you start coding this assignment. The earlier sections describe the data structures we will be using, while the later sections describe how this is distinct from the structures in HW14.
Stack
It is recommended you start by implementing print_list(…)
as having a working print function will make debugging a lot easier. This function will use
a function address similar to the one used in warmup.c.
After that, it is recommended to implementing the stack logic in stack_push(…)
and stack_pop(…)
next, as neither require using the compare function.
When a stack is implemented as a linked list, the object that is pushed onto the stack always appears as the first item on the list. Therefore, if we push the list of numbers 5, 10, 4, 3, 1 onto a stack successively, we have the following linked list:
1→3→4→10→5→NULL
If we pop from the stack, we have
3→4→10→5→NULLIf we now push 7 onto the stack, we have
7→3→4→10→5→NULLTo remove an element from the stack, we remove the item from the beginning of the list, as that is the most convenient place to remove it.
You will also write a function to destroy a linked list and free all memory associated with it.
Priority Queue
A priority queue is implemented as a linked list where the objects stored in the list are actually ordered, which is the same idea as the data structure from HW12. For example, if we want to store a linked list of numbers 5, 10, 4, 3, 1, and the linked list is used to implement a priority queue, the linked list should appear as follows:
1→3→4→5→10→NULL
Here, we assume that "→" is the "pointer." We also assume that a smaller number has a higher priority, and it is therefore listed first. If we now want to get number 7 to join the priority queue, we should have the list updated to:
1→3→4→5→7→10→NULL
We call this the enqueue operation. To remove an item from a priority queue, we always remove the item with the highest priority. This is call the dequeue operation. In this case, a dequeue operation will remove 1 from the list:
3→4→5→7→10→NULLNote that both the stack and the priority queue remove elements from the front.
Your implementation of a priority queue will be similar to the implementaton on HW12. The main differences is the addition of the dequeue function, and the removal of the requirement to fill an array from a priority queue. In addition, unlike the stack, priority queues require sorting the elements.
Binary Search Tree
Unlike a linked list, each node in a binary search tree has two children, named the left and the right child. A binary search tree requires that all left children are smaller than the root, while all right children are larger than the root.
Similarly to the priority queue, the binary search tree is a sorted data structure. However, unlike the linked lists we will not be implementing logic to remove nodes from a binary search tree, only to add nodes and print the nodes.
Generic data types
In previous assignments with linked lists and binary search trees, you always specified the type of data
that would be stored. In this homework, the Node
and BSTNode
will contain a
void*
(memory address to any type). This allows you to use the same
code for structures of any type.
This is simlar to the way that
qsort(…)
allows you to sort arrays of any type, as seen in the warmup.
Function addresses
Code that deals with generic data types often needs to pass functions as parameters. However, the declarations are UGLY.
We will start with a couple examples.
Example #1: Declare local variable of type “address to function taking parameter types and returning return type”
Let's start with an example. In the snippet below, we have two functions,
_print_square(…)
and _print_cube(…)
.
Instead of calling them directly, we will assign the address of each function to
a local variable and then call them by way of that local variable, just to demonstrate
the syntax.
#include <stdio.h>
#include <stdlib.h>
void _print_square(int n) {
printf("%d² is %d\n", n, n*n);
}
void _print_cube(int n) {
printf("%d³ is %d\n", n, n*n*n);
}
int main(int argc, char* argv[]) {
// Declare variable «fn» of type "address to function taking an int and returning void"
void (*fn)(int);
// Initialize fn to the address of _print_square(…). Normally we delcare and initialize
// on the same line, but we have separated the two for the clarity of this explanation.
fn = _print_square;
// Call _print_square(…) by way of variable «fn».
fn(7);
// Now set «fn» to the address of _print_cube.
fn = _print_cube;
// Call _print_cube(…) by way of variable «fn».
fn(7);
return EXIT_SUCCESS;
}
/* vim: set tabstop=4 shiftwidth=4 fileencoding=utf-8 noexpandtab: */
The type of variable fn
is void(*)(int)
.
In words, we will say, “«fn» is the address to a function taking an int and returning void.”
Example #2: Declare parameter of type “address to function taking parameter types and returning return type”
Instead of creating a local variable, we can also use that same type
(void(*fn)(int)
) as a parameter to
a function. In the example below, we call a function
_call_print_fn(…)
which then calls
another function, specified by the second argument.
#include <stdio.h>
#include <stdlib.h>
void _print_square(int n) {
printf("%d² is %d\n", n, n*n);
}
void _print_cube(int n) {
printf("%d³ is %d\n", n, n*n*n);
}
void _call_print_fn(int n, void (*print_fn)(int)) {
print_fn(n);
}
int main(int argc, char* argv[]) {
// Call _print_square(…) by way of the helper function.
_call_print_fn(7, _print_square);
// Call _print_cube(…) by way of the helper function.
_call_print_fn(7, _print_cube);
return EXIT_SUCCESS;
}
/* vim: set tabstop=4 shiftwidth=4 fileencoding=utf-8 noexpandtab: */
Like the previous example, the type of parameter print_fn
is void(*)(int)
.
In words, we will say, “«print_fn» is the address to a function taking an int and returning void.”
The warmup included in the assingment will give you practice working with function addresses to ensure you are ready to use them on the homework. It is worth of your grade on this homework.
Basic form for function address variables
The basic form for declaring a variable called Nameof type “address to function taking Parameter Types and returning Return Type” is:
Return Type(*Name)(Parameter Types)
Yes, it's UGLY. The name of that variable is in the middle!!!
⚠ Do not use parentheses when passing a function address
or assigning it to a variable. In the code above,
✘_call_print_fn(7, _print_cube(…))
will not work.
⚠ Passing a function address is totally different from invoking your
mu_run(…)
macro with a function (e.g., mu_run(_test_empty_list)
).
Although mu_run(…)
looks like a function, it is really just transorming
your code so that function can be called elsewhere and/or its name can be printed.
In contrast, passing a function address is passing an actual address. It is a
number that can be printed using printf(…)
, if you wish.
Use static
in definitions of helper functions
Starting with HW14, all helper functions must be declared as static in any assignment that involves compiling ≥2 files together.
void _know_your_worth() { ▒▒▒▒▒▒ } | ⇒ | static void _know_your_worth() { ▒▒▒▒▒▒ } |
That tells the compiler the function should only be available from the current file. Code in other files won't be able to call it.
Requirements
- Your submission must contain each of the following files, as specified:
file contents warmup.c function (10%) main(int argc, char✶ argv[])
→ return type: intDo not modifymain(…)
unless you are opting out.If you choose to opt out, it should contain the following. (You may copy this.)
printf("I already know this and do not need to practice."); return EXIT_SUCCESS;
print integer(const void✶ a value)
→ return type: voidcompare integers(const void✶ a first, const void✶ a second)
→ return type: intaddresses.h type definitions Nodestruct type with 2 fields:a_value
(void*),next
(Node*)BSTNodestruct type with 3 fields:a_value
(void*),left
, andright
(BSTNode*)function declarations one for each required function in your addresses.c - Do not include helpers (if any) here.
addresses.c linked list functions (50%) stack push(Node✶✶ a top, void✶ a value)
→ return type: void- *a_top stores the address of the first Node in the linked list.
- a_value stores the address of the generic type.
- The a_value field of the newly allocated Node should store a_value.
- The newly allocated Node should become the first Node of the list, and *a_top should be updated.
- Hint: You will call
malloc(…)
exactly once. You will not callfree(…)
in this function.
stack pop(Node✶✶ stack))
→ return type: Node✶Detach and return the node from the head of the linked list. It is the caller's responsiblity to free memory associated with the returned node.- This is very similar to
pq_dequeue(…)
. See that function for more info.
pq enqueue(Node✶✶ a head, void✶ a value, int(✶cmp fn)(const void✶, const void✶)))
→ return type: voidAdd a new node with valuea_value
to a priority queue, using functioncmp_fn(…)
to determine the ordering of the priority queue.- *a_head refers to the head of the linked list.
- If *a_head is NULL, the list is empty.
- a_value is the address of whatever value is associated with this node.
- cmp_fn(…) is a comparison function with the same semantics
as the comparison function for
qsort(…)
. - The newly allocated Node should be inserted into the linked
list such that all Nodes before the new Node in
the list contain objects that are smaller (or equal) to the new
Node
(according to
cmp_fn(…)
). All Node's after the new Node in the list contain objects that are bigger (or equal) to the new Node (according tocmp_fn(…)
). - ⚠*a_head should be updated if the new Node becomes the first item in the list.
- Hint: You will call
malloc(…)
exactly once. You will not callfree(…)
in this function.
pq dequeue(Node✶✶ a head)
→ return type: Node✶Detach and return the head.- *a_head refers to the head (first node) of a valid linked list.
- If the list is empty, return NULL (since there is nothing to dequeue).
- Upon return,
*a_head
must be a valid linked list (possibly empty). - Reminder:
NULL
is a valid linked list of size 0. Thus:*a_head
will be NULL if the list is empty.- Upon removing the last node, you should set
*a_head
to NULL.
- You must set the next field of the removed Node to NULL.
- The caller is responsible for freeing the detached node, and any memory
it refers to.
- ⚠ This function should not call
free(…)
, directly or indirectly.
- ⚠ This function should not call
- You are effectively splitting a linked list into two lists.
- Hint: You should not call
malloc(…)
orfree(…)
in this function.
print list(Node✶ head, void (✶print fn)(const void✶))
→ return type: void- head is the address of the first Node in the list.
- print_fn(…) is a function that prints your value *a_value in a useful format.
- Separate each element with a newline character.
destroy list(Node✶✶ a head, void (✶destroy fn)(void✶))
→ return type: void- *a_head stores the address of the first Node in the list.
- destroy_fn(…) is a function that deallocates *a_value, as needed.
- Set the head to NULL in the caller's stack frame (i.e.,
*a_head = NULL;
).
binary tree functions (30%) bst insert(BSTNode✶✶ a root, void✶ a value, int(✶cmp fn)(const void✶, const void✶)))
→ return type: voidAdd a new node with valuea_value
to a binary search tree, using functioncmp_fn(…)
to determine whether the value is stored on the left or right branch.- *a_root refers to the root of the binary search tree.
- If *a_root is NULL, the tree is empty.
- a_value is the address of whatever value is associated with this node.
- cmp_fn(…) is a comparison function with the same semantics
as the comparison function for
qsort(…)
. - The newly allocated iBSTNode should be inserted into the tree
such that for any BSTNode in the tree,
any BSTNode on the left contain a smaller (or equal) value
and any BSTNode on the right contain a larger value
(according to
cmp_fn(…)
). - Hint: You will call
malloc(…)
exactly once. You will not callfree(…)
in this function.
print tree(BSTNode✶ root, void (✶print fn)(const void✶))
→ return type: void- root stores the address of the uppermost BSTNode in the tree.
- print_fn(…) is a function that prints your value *a_value in a useful format.
- Elements should be printed in order from smallest to largest (based on the compare function). In other words, before printing the root, we print all values on the left. After printing the root, we print all the values on the right.
- Separate each element with a newline character.
destroy tree(BSTNode✶✶ a root, void (✶destroy fn)(void✶))
→ return type: void- *a_root stores the address of the uppermost BSTNode in the tree.
- destroy_fn(…) is a function that deallocates *a_value, as needed.
- Set the root to NULL in the caller's stack frame (i.e.,
*a_root = NULL;
).
test_addresses.c functions (10%) main(int argc, char✶ argv[])
→ return type: intTest all of the required functions in your addresses.c.- 100% code coverage is required.
- Use miniunit.h.
- Note: The names of parameters in the function definition do not need to match the declaration. Don't worry if you see a parameter listed as
a_head
in one place anda_top
in another. - All helper functions should be declared as
static
unless directly tested in test_addresses.c. - Only the following externally defined functions and constants are allowed in your .c files. (You may put the corresponding #include <…> statements in the .c file or in your addresses.h, at your option.)
header functions/symbols allowed in… stdbool.h bool
,true
,false
addresses.c
,addresses.h
,test_addresses.c
assert.h assert
addresses.c
,test_addresses.c
stdio.h printf
,fprintf
test_addresses.c
stdout
,FILE
,fputc
addresses.c
,test_addresses.c
stdlib.h malloc
,free
,NULL
,size_t
,EXIT_SUCCESS
,EXIT_FAILURE
addresses.c
,test_addresses.c
,addresses.h
string.h strcmp
,strlen
,strcpy
,strncpy
test_addresses.c
- Submissions must meet the code quality standards and the course policies on homework and academic integrity.
Examples of cmp_fn(…)
We have provided examples of destroy_fn(…)
functions.
Here are some examples of cmp_fn(…)
(for the pq_enqueue(…)
function) and
print_fn(…)
.
If we store the addresses of int (i.e., int*) as the addresses of the generic
type (void*) in .a_value, we can use the following comparison function:
static int int_cmp(const void* p1, const void* p2) { const int* a_n1 = p1; const int* a_n2 = p2; return *a_n1 - *a_n2; }Here, the addresses p1 and p2 have been typecast to const int* using an assignment statement, and dereferenced to obtain const int for comparison (with the difference being the result). If we store the addresses of char (i.e., char*) as the addresses of the generic type (void*) in .a_value, and each address of char is actually the address of the first character of string, we can use the following comparison function:
static int _string_cmp(const void* p1, const void* p2) { const char* str1 = p1; const char* str2 = p2; return strcmp(str1, str2); }Note there is no requirement that strings are always sorted alphabetically, the behavior is up to the caller. If we wished, we could instead sort them by length:
static int _string_length_cmp(const void* p1, const void* p2) { const char* str1 = p1; const char* str2 = p2; return strlen(str1) - strlen(str2); }
Submit
To submit HW14 from within your hw14 directory, type
make submit
That should result in the following command:
264submit HW14 addresses.c test_addresses.c miniunit.h clog.h Makefile addresses.h warmup.c
Pre-tester ●
The pre-tester for HW14 has been released and is ready to use.
Q&A
How much work is this?
Assuming you successfully finished HW12, the main work is adapting your integer comparaisons to instead use the compare function, along with implementing the remove functions.
What is the hardest part?
Thepq_enqueue(…)
function or thebst_insert(…)
is the most difficult.Can I make this easier?
Yes! Use TDD to test and build in lockstep. Write a test. Then, write just enough implementation code to pass that test (and any prior tests).Do not try to code it all at once. Start by getting it working for the simplest possible test. This is probably adding a single node (i.e., adding a node to an empty list). Test that it works before you move on. Next, get it working and tested for a list of size two, where the items are assumed to be added in descending order (so that no reordering is necessary). Get it working before you move on.Once you get into the longer test cases that require placing the new node in the right position in the list, you should draw pictures. Walk through how the connections need to change on paper before attempting to do it in code.How do the compare, print, and destroy functions fit together with the rest of this code?
Here's a code example.⚠ DO NOT COPY CODE FROM THIS FILE INTO YOUR OWN CODE. We are providing this only to clarify how the pieces are supposed to fit together. It would not be a good basis for your test_addresses.c. However, you may run this code in your account. It has been added to the starter code.#include <stdio.h> #include <stdlib.h> #include <string.h> #include "huffman.h" static int _cmp_strings_by_length(const void* a_lhs, const void* a_rhs) { size_t len_lhs = strlen(a_lhs); // No typecast needed! Never use typedcast unless: size_t len_rhs = strlen(a_rhs); // (a) necessary, (b) safe, and (c) well-understood. return len_lhs - len_rhs; // shortest string comes forst } static void _destroy_string(void* string) { free(string); } static void _print_string(const void* a_value) { // not every funciton allows typecasts in the parameters // we can always declare a local variable to handle the cast const char* str = a_value; printf("%s", str); } static char* _copy_string(const char* src_string) { size_t num_chars = strlen(src_string); char* dst_string = malloc(sizeof(*dst_string) * (num_chars + 1)); return strcpy(dst_string, src_string); } int main(int argc, char* argv[]) { // Create a priority queue of four strings, ordered by length (ascending). Node* head = NULL; // size 0 pq_enqueue(&head, _copy_string("Reginald"), _cmp_strings_by_length); // size 1 pq_enqueue(&head, _copy_string("was"), _cmp_strings_by_length); // size 2 pq_enqueue(&head, _copy_string("a"), _cmp_strings_by_length); // size 3 pq_enqueue(&head, _copy_string("hatter"), _cmp_strings_by_length); // size 4 // Print contents of the list: a was hatter Reginald print_list(head, _print_string); // Alternate way to print contents of the list, not as generalized for(Node* curr = head; curr != NULL; curr = curr -> next) { printf("%s\n", (char*)(curr -> a_value)); } // Deallocate all memory used by the list, including each node's payload. destroy_list(&head, _destroy_string); return EXIT_SUCCESS; } /* ______ * Output * ‾‾‾‾‾‾ * a * was * hatter * Reginald * a * was * hatter * Reginald */ /* vim: set tabstop=4 shiftwidth=4 fileencoding=utf-8 noexpandtab: */
Why are we not sorting arrays? Isn't this assignment the sequel to HW12?
While arrays are one way to represent sorted data, not all sorted data structures are arrays. Additionally, it is quite difficult to index into an array ofvoid
as we do not know the size of a single element.qsort(…)
solves this problem by taking an additional size parameter for the size of a single element. We could do something similar, but it adds an extra piece of complexity that is not nessesscary for the goal of this assignment.
Updates
7/26/2022 |
|
7/27/2022 |
|
Updates may be posted here.