Sorting 2: generic data structures
Goals
- Practice writing code using
void*addresses. - Practice writing code using function addresses
(function pointers).
Overview
In this assignment, you will refactor one of your sorting algorithms from HW14 to support any data type. Alternatively, you can implement a new sorting algorithm, such as an insertion sort or a selection sort.
There are no starter files.
About void* addresses
In C, void* represents an address of any type of data, for instance it could be the address of an int
or a complex struct like BSTNode from HW14. Since we do not know the type of
data in the address, we are disallowed performing operations such as dereferencing (the unary * operator),
indexing (the [] operator), and performing any sort of address arithmetic (for instance, using +).
C allows you to freely assign from any type of address into a void*, and allows you to freely assign from a
void* into any other type (with the exception of cases that would discard a const qualifier).
As a result, it is entirely up to the programmer to ensure these assignments make sense; as otherwise the program
may have inconsistent behavior which possibly leads to memory violations or segmentation faults.
These addresses are typically used for one of three reasons:
- The type of the data has not yet been decided
(for example, in
malloc(…)). - The type of the data does not matter (for example, in
printf(…)with the%pargument). - Any type of data is supported (for example, in
qsort(…)).
We have worked with the first two cases quite a bit on previous assignments, subtly taking advantage of the ability to
assign from void* to another address type and back. The third case was just briefly seen on
HW14, which we will explore more in depth on this assignment.
When writing functions that are designed to support any type of data, the inability to dereference or perform address
arithmetic are most notable. In some cases, this requires passing in the size of the data referenced by the address
to process, as seen in fread(…) and fwrite(…). In other cases,
we can instead take advantage of function addresses to tell the function how to perform certain required actions.
Function addresses
Code that deals with generic data types often needs to pass functions as parameters. However, the declarations are UGLY.
We will start with a couple examples.
Example #1: Declare local variable of type “address to function taking parameter types and returning return type”
Let's start with an example. In the snippet below, we have two functions,
_print_square(…) and _print_cube(…).
Instead of calling them directly, we will assign the address of each function to
a local variable and then call them by way of that local variable, just to demonstrate
the syntax.
#include <stdio.h>
#include <stdlib.h>
void _print_square(int n) {
printf("%d² is %d\n", n, n*n);
}
void _print_cube(int n) {
printf("%d³ is %d\n", n, n*n*n);
}
int main(int argc, char* argv[]) {
// Declare variable «fn» of type "address to function taking an int and returning void"
void (*fn)(int);
// Initialize fn to the address of _print_square(…). Normally we delcare and initialize
// on the same line, but we have separated the two for the clarity of this explanation.
fn = _print_square;
// Call _print_square(…) by way of variable «fn».
fn(7);
// Now set «fn» to the address of _print_cube.
fn = _print_cube;
// Call _print_cube(…) by way of variable «fn».
fn(7);
return EXIT_SUCCESS;
}
/* vim: set tabstop=4 shiftwidth=4 fileencoding=utf-8 noexpandtab: */
The type of variable fn is void(*)(int).
In words, we will say, “«fn» is the address to a function taking an int and returning void.”
Example #2: Declare parameter of type “address to function taking parameter types and returning return type”
Instead of creating a local variable, we can also use that same type
(void(*fn)(int)) as a parameter to
a function. In the example below, we call a function
_call_print_fn(…) which then calls
another function, specified by the second argument.
#include <stdio.h>
#include <stdlib.h>
void _print_square(int n) {
printf("%d² is %d\n", n, n*n);
}
void _print_cube(int n) {
printf("%d³ is %d\n", n, n*n*n);
}
void _call_print_fn(int n, void (*print_fn)(int)) {
print_fn(n);
}
int main(int argc, char* argv[]) {
// Call _print_square(…) by way of the helper function.
_call_print_fn(7, _print_square);
// Call _print_cube(…) by way of the helper function.
_call_print_fn(7, _print_cube);
return EXIT_SUCCESS;
}
/* vim: set tabstop=4 shiftwidth=4 fileencoding=utf-8 noexpandtab: */
Like the previous example, the type of parameter print_fn
is void(*)(int).
In words, we will say, “«print_fn» is the address to a function taking an int and returning void.”
The warmup included in the assingment will give you practice working with function addresses to ensure you are ready to use them on the homework. It is worth of your grade on this homework.
Basic form for function address variables
The basic form for declaring a variable called Nameof type “address to function taking Parameter Types and returning Return Type” is:
Return Type(*Name)(Parameter Types)
Yes, it's UGLY. The name of that variable is in the middle!!!
⚠ Do not use parentheses when passing a function address
or assigning it to a variable. In the code above,
✘_call_print_fn(7, _print_cube(…))
will not work.
⚠ Passing a function address is totally different from invoking your
mu_run(…) macro with a function (e.g., mu_run(_test_empty_list)).
Although mu_run(…) looks like a function, it is really just transforming
your code so that function can be called elsewhere and/or its name can be printed.
In contrast, passing a function address is passing an actual address. It is a
number that can be printed using printf(…), if you wish.
Generic linked lists
As part of this homework, your task is to create a linked list which supports any type of data,
then implement a function sort_list(…) which can sort that linked list
based on the passed compare function address. Your code should be tested on at least two
different types of data, so you are confident it will work with any arbitrary type of data
we will test it on. Upon declaring the head of the linked list, you will decide on the type of data in the list,
then remain consistent with that type until the list is destroyed.
We will make use of three different function addresses formats to help work with the data, print, destroy,
and compare.
All of these functions will take in either void* or const void* parameters. The type of the data
can be assumed inside of implementations of the function, as long as we are consistent in areas they are used. That is,
if we are working with a linked list of strings, all our function address arguments should assume the void*
refers to a char*; we should not mix char* and int* function addresses on a list of
strings.
Print function
The print function void (*print_fn)(const void*) is passed into print_list(…)
to tell it how to print the data in the address in the eacn node. Most implementations of print functions will simply call
printf(…) with different format codes. Note that for a particular
type of data, there may be multiple reasonable print functions; as long as the type is consistent
with the type of data in the linked list this is valid (that is, don't try to print a linked list
of integers using a print function for strings).
Example: Print string
Note this is assuming the address type in the node is char*, it is sometimes useful to
consider nodes containing char** which would need a subtly different print function.
void _print_string(const void* a_value) {
const char* str = a_value;
fputc(str, stdout);
}
Example: Print float value
void _print_double_two_decimals(const void* a_value) {
const double* a_num = a_value;
printf("%.2f", *a_num);
}
Destroy function
The destroy function void (*destroy_fn)(void*) is passed into empty_list(…)
to tell it how to free up any memory associated with the address in each node.
Example: Destroy heap data
The C standard library function free(…) matches this signature and can be used
as a destroy function for heap allocated data.
Example: destroy non-heap data
Since a destroy function will be required to make use of empty_list(…), we will often
make use of an empty destroy function for the sake of stack allocated values or values on the data segment.
void _destroy_nothing(void* a_value) {
// no operation
}
Example: destroy a linked list
As an extreme example, we may decide to create a nested linked list, where the inner lists contain heap allocated strings. You probably will not need to test anything this complicated, but its provided as an example of destroy functions.
void _destroy_list_of_heap_allocated_data(void* a_value) {
List* a_list = a_value;
empty_list(a_list, free);
}
Compare functions
The compare function int (*compare_fn)(const void*, const void*) is passed into
sort_list(…) to describe which of the two elements is bigger. It should return a positive
number if the first parameter is bigger, or a negative number if the second parameter is bigger. If the two parameters
are the same size, it should return 0.
A compare function can be used by comparing its return value to 0 with <, >= and alike:
void _compare_example(const void* a_first, const void* a_second,
int (*compare_fn)(const void*, const void*)) {
// equivelent to first < second
if (compare_fn(a_first, a_second) < 0) {
// do stuff if first is less than second
}
}
Example: compare strings
strcmp(…) is setup perfectly to use as a compare function; however for our usecase
the parameter types are wrong so we will need to construct a function to call it:
int _compare_strings(const void* a_left, const void* a_right) {
const char* left_str = a_left;
const char* right_str = a_right;
return strcmp(left_str, right_str);
}
Example: compare strings reversed
By negating the return value, we can sort the list in decending order instead of ascending.
int _compare_strings_reversed(const void* a_left, const void* a_right) {
const char* left_str = a_left;
const char* right_str = a_right;
return -strcmp(left_str, right_str);
}
Example: compare strings by length
We are not required to sort strings alphabetically, any consistent method for comparison is valid as long as we follow the specification above. For instance we can sort by string length:
int _compare_strings_by_length(const void* a_left, const void* a_right) {
const char* left_str = a_left;
const char* right_str = a_right;
return strlen(left_str) - strlen(right_str);
}
Testing
You will likely want to test each function with at least 2 types of data in the list to ensure your code is properly generic.
Integer stack allocated list
int _test_integer_append_size_3() {
mu_start();
int numbers[] = { 1, 6, 2 };
int numbers_len = sizeof(numbers) / sizeof(*numbers);
List list = { .head = NULL, .tail = NULL, .size = 0 };
for (int i = 0; i < numbers_len; i++) {
append(&list, numbers + i);
}
// test list structure
mu_check(list.size == 3);
mu_check(list.head != NULL);
mu_check(list.head->next != NULL);
mu_check(list.head->next->next != NULL);
mu_check(list.head->next->next->next == NULL);
mu_check(list.head->next->next == list.tail);
// test list contents
mu_check(list.head->a_value == numbers + 0);
mu_check(list.head->next->a_value == numbers + 1);
mu_check(list.head->->next->next->a_value == numbers + 2);
// list is stack allocated, no data inside the node to free
// however, still need to free up the nodes
empty_list(&list, _destroy_nothing);
mu_end();
}
String heap allocated list
int _test_heap_string_sort_list_size_4() {
mu_start();
const char* strings[] = { "Grape", "Apple", "Cranberry", "Blueberry" };
int strings_len = sizeof(strings) / sizeof(*strings);
List list = { .head = NULL, .tail = NULL, .size = 0 };
for (int i = 0; i < strings_len; i++) {
// we have written a lot of examples of this helper before,
// using it to ensure heap allocation
append(&list, _copy_string(strings[i]));
}
sort_list(&list, _compare_strings);
// print list for debug
print_list(&list, _print_string);
// test list structure
mu_check(list.size == 4);
mu_check(list.head != NULL);
mu_check(list.head->next != NULL);
mu_check(list.head->next->next != NULL);
mu_check(list.head->next->next->next != NULL);
mu_check(list.head->next->next->next->next == NULL);
mu_check(list.head->next->next->next != list.tail);
// test list contents
mu_check_strings_equal(list.head->a_value, "Apple");
mu_check_strings_equal(list.head->next->a_value, "Blueberry");
mu_check_strings_equal(list.head->next->next->a_value, "Cranberry");
mu_check_strings_equal(list.head->next->next->next->a_value, "Grape");
// list is heap allocated, so both the contents and the nodes need to be freed
empty_list(&list, free);
mu_end();
}
Warm-up exercises
This assignment includes a warm-up exercise to help you get ready. This accounts for of your score for HW17. Scoring will be relatively light, but the usual base requirements apply.
-
Integer print function
Create a print function for printing integers in decimal from avoid*. This should only require a single line of code. -
Integer compare function
Create a compare function for integers. This should be similar to the string length comparison example above.
The structure of the warmup.c file is described in the Requirements table below. You should write your own warmup.c. You may add helper functions, if you wish.
Opt out.
In a hurry, and don't need the practice? This warm-up is here to help you learn what you need to succeed on the rest of this assignment—not to add additional work. Therefore, we give you an option. Those who feel that they do not need the practice may "opt out". by modifying warmup.c so that it does nothing but print the following message exactly and then exit:
I already know this and do not need to practice.
If you do that, then your score for HW17 will be based solely on the rest of this assignment. If you leave the warmup.c undone, if you do not turn in a warmup.c, or if the message it prints does not match perfectly, then you will receive 0 for the warmup portion of this assignment ().
TDD
As always, you must use TDD, but this time, you choose your own steps.
The requirements are as follows:
You must have at least 4 submissions, where each step…
- Adds a significant amount of code, relative to the previous step.
- Passes your own tests.
- Your tests have 100% line coverage, as reported by
gcov. (Yourmake coveragerule should do this for you.)
Q: What does “significant amount of code” mean?
A: Your effort should be spread out in a way that reflects that you are doing TDD.
Be reasonable. In other words, do not circumvent the purpose. In case of any regrade requests, we will evaluate whether your process demonstrated a good faith effort to follow TDD.
Requirements
- Your submission must contain each of the following files, as specified:
- Whenever
size==0,headandtailmust be NULL. - Do not include helpers (if any) here. Also do not include any structs used as part of a helper.
- Create a new node to contain
a_valueand place it on the list aftertail. - Update
sizeandtailto be consistent with the new list structure. - When
a_list->size==0,a_list->headanda_list->tailwill beNULL. - When
a_list->size!=0,a_list->headanda_list->tailwill NOT beNULL. malloc(…)may be called exactly once in this function.- ⚠ Do not attempt to sort values here, you lack the proper context.
- ⚠ Do not try to copy
a_value, if the caller wishes a copy they will copy it themselves. - Use
compare_fnto sort data in the list. Do not sort by the addressesa_valuedirectly. - Heap allocation (i.e., calls to
malloc(…)) is allowed, provided you free up any objects that are not returned as part ofa_list. - This can be implemented without any additional
malloc(…)calls, which will likely be the easiest approach. - If you decide to sort by allocating a second linked list, you are free to replace the list
in
a_listwith entirely new nodes if you wish, provided the old list is freed. Make sure you don't accidently free upa_value(use an empty destroy function withempty_list(…)). - If you wish to use another type of data structure (e.g. a binary search tree), define any relevant structs in sorts.c.
- ⚠ Do not call
compare_fnon the nodes, only ona_value. - Call
print_fnona_valueto print the data. - Separate elements with a single newline chracter. Do not rely on the print function including a newline.
- ⚠ Do not call
print_fnon the nodes, only ona_value. - Set
headandtailto NULL, andsizeto 0. free(…)may be called from a total of one place in this function and any it depends on, excluding calls insidedestroy_fn.- ⚠ Do not call
free(…)ona_value, only callfree(…)on the nodes. - ⚠ Do not call
destroy_fnon the nodes, only calldestroy_fnona_value - This must cause every line of code in your sorts.c to be executed (that is, 100% code coverage).
- Every public function in sorts.c must be called directly from
main(…)and/or from a helper within test_sorts.c. - These functions should be in test_sorts.c, not sorts.c.
- ⚠ Your code in test_sorts.c should not assume a particular address function or type of data will be used in any method.
- Use the
typedefsyntax to declare all struct types. - Only the following externally defined functions and constants are allowed in your .c files. (You may put the corresponding #include <…> statements in the .c file or in your sorts.h, at your option.)
Feel free to suggest additional header files or functions that you would like to use.header functions/symbols allowed in… stdbool.h bool,true,falsewarmup.c,sorts.c,sorts.h,test_sorts.cstdio.h printf,fprintf,stdout,FILEwarmup.c,sorts.c,test_sorts.cstdlib.h malloc,free,NULL,EXIT_SUCCESS,EXIT_FAILUREwarmup.c,sorts.c,test_sorts.c,sorts.hassert.h assertwarmup.c,sorts.c,test_sorts.cstring.h strcmp,strlen,strcpytest_sorts.c - Submissions must meet the code quality standards and the policies on homework and academic integrity.
| file | contents | |
|---|---|---|
| warmup.c | function definitions |
main(int argc, char✶ argv[])
→ return type: int
Do not modify
main(…) unless you
are opting out.If you choose to opt out, it should contain the following. (You may copy this.) printf("I already know this and do not need to practice.");
return EXIT_SUCCESS;
|
print integer(const void✶ a value)
→ return type: void
|
||
compare integers(const void✶ a first, const void✶ a second)
→ return type: int
|
||
| sorts.h | type definitions |
List
struct type with 3 fields: head (ListNode*), tail (ListNode*) and size (int)
|
|
ListNode
struct type with 2 fields: a_value(void*), next (ListNode*)
|
||
| function declarations |
one for each required function in your sorts.c
|
|
| sorts.c | function definitions |
append(List✶ a list, void✶ a value)
→ return type: void
Appends a new node to a_list containing a_value.
|
sort list(List✶ a list, int (✶compare fn)(const void✶, const void✶))
→ return type: void
Sorts a_list using any algorithm of your choice. We recommend merge sort as you can reuse your code from HW14.
|
||
print list(const List✶ a list, void (✶print fn)(const void✶))
→ return type: void
Print the data contained in the linked list.
|
||
empty list(List✶ a list, void (✶destroy fn)(void✶))
→ return type: void
Free all the nodes in the list, and call destroy_fn on all a_value in the nodes.
|
||
| test_sorts.c | function definitions |
main(int argc, char✶ argv[])
→ return type: int
Test your functions in sorts.c.
|
Functions to pass in as function addresses for calling methods in sorts.c.
|
Submit
To submit HW17 from within your hw17 directory,
type
264submit HW17 sorts.h sorts.c test_sorts.c miniunit.h log_macros.h Makefile
Pre-tester ●
The pre-tester for HW17 has been released and is ready to use.
Q&A
-
What sorting algorithm should I use?
We recommend merge sort simply because you already wrote that code on HW14 and it will be fairly easy to adapt. If you wish, you could use another method such as insertion sort or selection sort. -
Can I us a binary search tree sort?
You may, just make sure you do not modify sorts.h to include any references to binary search tree functions or structs. Instead, either declare them at the top of sorts.c or define a new header file (e.g. bst_sort.h). Defining a new header will make it easier to independently test your binary search tree, just make sure you don't forget to submit it when you run264submit. -
Can I use
qsort(…)?
No.sort_list(…)does not get passed the proper context to callqsort(…)overa_value, and attempting toqsort(…)the nodes directly will be very difficult. -
Is
compare_fnthe same as inqsort(…)?
The compare function behaves nearly the same, but there is one subtle difference with how we are constructing our lists. Inqsort(…), the compare function is always passed the address of an array element, meaning we have to dereference the address once to get the element being sorted. Our generic linked lists support storing address types directly in the list rather than nested inside an array, meaning the functions get passed the type directly.
Concretely, this means inqsort(…), an array of strings will be passedchar**for the compare function, while a linked list of strings will be passedchar*. We would get the same behavior if you intentionally storedchar**ina_value. -
Why aren't we sorting arrays directly? Why are we just sorting linked lists?
Working with arrays generically leads to some very ugly syntax and some slightly hacky code. We did not feel requiring that added anything to the assignment so just stopped at sorting linked lists. -
Can I add helper functions to sorts.c?
Yes. Make sure the names begin with "_".
Updates
| 7/29/2024 |
|
| 7/31/2024 |
|