Advanced C Programming

Summer 2022 ECE 264 :: Purdue University

⚠ This is a PAST SEMESTER (Summer 2022).
Due 7/28

Sorting 2: generic data structures

This homework is the second part in a 2 part series about dynamic data structures. In HW12, you created a priority queue linked list and a binary search tree, and used both to sort an array of integers. This homework will reuse those concepts, creating generic data structures that can store any type of data.

Learning goals

You should learn or practice how to:

  1. Create and manipulate data structures using void addresses.
  2. Use function addresses to make your data structures work with any type of values.
  3. Gain practice generalizing existing code for a later task.

Getting Started

Get the starter code

Start by getting the files. Type 264get hw14 and then cd hw14 from bash. This will give you warmup.c. This warmup is not far from what we did in lecture. By completing it, you can catch some pitfalls from using function addresses early, and will write some code useful to test addresses.c.

Copy in your Makefile, miniunit.h, and clog.h from a previous homework.

cp ../▒▒▒/{Makefile,miniunit.h,clog.h} .
⚠ Don't forget the period at the end of that command.

Update your Makefile

The variables at the top of your file should contain the following:
ASG_NICKNAME HW14
SRC_C addresses.c
SRC_H addresses.h miniunit.h clog.h
TEST_C test_addresses.c
EXECUTABLE test_addresses

Adjust your submit rule to ensure you are submitting warmup.c, ideally you will only have to change variables at the top. When you run make submit, something similar to the the following command should be executed:

264submit hw14 addresses.c addresses.h test_addresses.c miniunit.h clog.h Makefile warmup.c
… but your rule should use the variables wherever possible (i.e., 264submit $(ASG_NICKNAME) $(SUBMIT_FILES))

Double-check your $(EXECUTABLE) rule. Running make or make test_addresses rule should result in the following command (but using your variables wherever possible):

gcc -o test_addresses -g -std=c11 -Wall -Wshadow -Wvla -Werror -pedantic addresses.c test_addresses.c
If you wish, you could add a make warmup rule.

Overview

In this exercise, you will update your linked list from HW12 to support any type of data and implement a simple stack. You will also take advanage of function addresses to allow creating sorted data structures such as a priority queue and a binary search tree.

It is recommended you read the full overview before you start coding this assignment. The earlier sections describe the data structures we will be using, while the later sections describe how this is distinct from the structures in HW14.

Stack

It is recommended you start by implementing print_list(…) as having a working print function will make debugging a lot easier. This function will use a function address similar to the one used in warmup.c.

After that, it is recommended to implementing the stack logic in stack_push(…) and stack_pop(…) next, as neither require using the compare function.

When a stack is implemented as a linked list, the object that is pushed onto the stack always appears as the first item on the list. Therefore, if we push the list of numbers 5, 10, 4, 3, 1 onto a stack successively, we have the following linked list:

1→3→4→10→5→NULL

If we pop from the stack, we have

3→4→10→5→NULL
If we now push 7 onto the stack, we have
7→3→4→10→5→NULL
To remove an element from the stack, we remove the item from the beginning of the list, as that is the most convenient place to remove it.

You will also write a function to destroy a linked list and free all memory associated with it.

Priority Queue

A priority queue is implemented as a linked list where the objects stored in the list are actually ordered, which is the same idea as the data structure from HW12. For example, if we want to store a linked list of numbers 5, 10, 4, 3, 1, and the linked list is used to implement a priority queue, the linked list should appear as follows:

1→3→4→5→10→NULL

Here, we assume that "→" is the "pointer." We also assume that a smaller number has a higher priority, and it is therefore listed first. If we now want to get number 7 to join the priority queue, we should have the list updated to:

1→3→4→5→7→10→NULL

We call this the enqueue operation. To remove an item from a priority queue, we always remove the item with the highest priority. This is call the dequeue operation. In this case, a dequeue operation will remove 1 from the list:

3→4→5→7→10→NULL
Note that both the stack and the priority queue remove elements from the front.

Your implementation of a priority queue will be similar to the implementaton on HW12. The main differences is the addition of the dequeue function, and the removal of the requirement to fill an array from a priority queue. In addition, unlike the stack, priority queues require sorting the elements.

Binary Search Tree

Unlike a linked list, each node in a binary search tree has two children, named the left and the right child. A binary search tree requires that all left children are smaller than the root, while all right children are larger than the root.

Similarly to the priority queue, the binary search tree is a sorted data structure. However, unlike the linked lists we will not be implementing logic to remove nodes from a binary search tree, only to add nodes and print the nodes.

Generic data types

In previous assignments with linked lists and binary search trees, you always specified the type of data that would be stored. In this homework, the Node and BSTNode will contain a void* (memory address to any type). This allows you to use the same code for structures of any type.

This is simlar to the way that qsort(…) allows you to sort arrays of any type, as seen in the warmup.

Function addresses

Code that deals with generic data types often needs to pass functions as parameters. However, the declarations are UGLY.

We will start with a couple examples.

Example #1: Declare local variable of type “address to function taking parameter types and returning return type

Let's start with an example. In the snippet below, we have two functions, _print_square(…) and _print_cube(…). Instead of calling them directly, we will assign the address of each function to a local variable and then call them by way of that local variable, just to demonstrate the syntax.

#include <stdio.h>
#include <stdlib.h>

void _print_square(int n) {
    printf("%d² is %d\n", n, n*n);
}

void _print_cube(int n) {
    printf("%d³ is %d\n", n, n*n*n);
}

int main(int argc, char* argv[]) {

    // Declare variable «fn» of type "address to function taking an int and returning void"
    void (*fn)(int);

    // Initialize fn to the address of _print_square(…).  Normally we delcare and initialize
    // on the same line, but we have separated the two for the clarity of this explanation.
    fn = _print_square;

    // Call _print_square(…) by way of variable «fn».
    fn(7);

    // Now set «fn» to the address of _print_cube.
    fn = _print_cube;

    // Call _print_cube(…) by way of variable «fn».
    fn(7);

    return EXIT_SUCCESS;
}
/* vim: set tabstop=4 shiftwidth=4 fileencoding=utf-8 noexpandtab: */

The type of variable fn is void(*)(int). In words, we will say, “«fn» is the address to a function taking an int and returning void.”

Example #2: Declare parameter of type “address to function taking parameter types and returning return type

Instead of creating a local variable, we can also use that same type (void(*fn)(int)) as a parameter to a function. In the example below, we call a function _call_print_fn(…) which then calls another function, specified by the second argument.

#include <stdio.h>
#include <stdlib.h>

void _print_square(int n) {
    printf("%d² is %d\n", n, n*n);
}

void _print_cube(int n) {
    printf("%d³ is %d\n", n, n*n*n);
}

void _call_print_fn(int n, void (*print_fn)(int)) {
    print_fn(n);
}

int main(int argc, char* argv[]) {

    // Call _print_square(…) by way of the helper function.
    _call_print_fn(7, _print_square);

    // Call _print_cube(…) by way of the helper function.
    _call_print_fn(7, _print_cube);

    return EXIT_SUCCESS;
}
/* vim: set tabstop=4 shiftwidth=4 fileencoding=utf-8 noexpandtab: */

Like the previous example, the type of parameter print_fn is void(*)(int). In words, we will say, “«print_fn» is the address to a function taking an int and returning void.”

The warmup included in the assingment will give you practice working with function addresses to ensure you are ready to use them on the homework. It is worth of your grade on this homework.

Basic form for function address variables

The basic form for declaring a variable called Nameof type “address to function taking Parameter Types and returning Return Type” is:

Return Type(*Name)(Parameter Types)

Yes, it's UGLY. The name of that variable is in the middle!!!

Do not use parentheses when passing a function address or assigning it to a variable. In the code above, _call_print_fn(7, _print_cube(…)) will not work.

Passing a function address is totally different from invoking your mu_run(…) macro with a function (e.g., mu_run(_test_empty_list)). Although mu_run(…) looks like a function, it is really just transorming your code so that function can be called elsewhere and/or its name can be printed. In contrast, passing a function address is passing an actual address. It is a number that can be printed using printf(…), if you wish.

Use static in definitions of helper functions

Starting with HW14, all helper functions must be declared as static in any assignment that involves compiling ≥2 files together.

void _know_your_worth() { ▒▒▒▒▒▒ } static void _know_your_worth() { ▒▒▒▒▒▒ }

That tells the compiler the function should only be available from the current file. Code in other files won't be able to call it.

Requirements

  1. Your submission must contain each of the following files, as specified:
    file contents
    warmup.c function (10%)
    main(int argc, char✶ argv[])
    return type: int
    Do not modify main(…) unless you are opting out.

    If you choose to opt out, it should contain the following. (You may copy this.)

    printf("I already know this and do not need to practice.");
    return EXIT_SUCCESS;
     print integer(const void✶ a value)
    return type: void
     compare integers(const void✶ a first, const void✶ a second)
    return type: int
    addresses.h type definitions
    Node
    struct type with 2 fields: a_value (void*), next (Node*)
    BSTNode
    struct type with 3 fields: a_value (void*), left, and right (BSTNode*)
    function declarations one for each required function in your addresses.c
    • Do not include helpers (if any) here.
    addresses.c linked list functions (50%)
    stack push(Node✶✶ a top, void✶ a value)
    return type: void
    1. *a_top stores the address of the first Node in the linked list.
    2. a_value stores the address of the generic type.
    3. The a_value field of the newly allocated Node should store a_value.
    4. The newly allocated Node should become the first Node of the list, and *a_top should be updated.
    5. Hint: You will call malloc(…) exactly once. You will not call free(…) in this function.
    stack pop(Node✶✶ stack))
    return type: Node✶
    Detach and return the node from the head of the linked list. It is the caller's responsiblity to free memory associated with the returned node.
    1. This is very similar to pq_dequeue(…). See that function for more info.
    pq enqueue(Node✶✶ a head, void✶ a value, int(✶cmp fn)(const void✶, const void✶)))
    return type: void
    Add a new node with value a_value to a priority queue, using function cmp_fn(…) to determine the ordering of the priority queue.
    1. *a_head refers to the head of the linked list.
    2. If *a_head is NULL, the list is empty.
    3. a_value is the address of whatever value is associated with this node.
    4. cmp_fn(…) is a comparison function with the same semantics as the comparison function for qsort(…).
    5. The newly allocated Node should be inserted into the linked list such that all Nodes before the new Node in the list contain objects that are smaller (or equal) to the new Node (according to cmp_fn(…)). All Node's after the new Node in the list contain objects that are bigger (or equal) to the new Node (according to cmp_fn(…)).
    6. *a_head should be updated if the new Node becomes the first item in the list.
    7. Hint: You will call malloc(…) exactly once. You will not call free(…) in this function.
    pq dequeue(Node✶✶ a head)
    return type: Node✶
    Detach and return the head.
    1. *a_head refers to the head (first node) of a valid linked list.
    2. If the list is empty, return NULL (since there is nothing to dequeue).
    3. Upon return, *a_head must be a valid linked list (possibly empty).
    4. Reminder: NULL is a valid linked list of size 0. Thus:
      1. *a_head will be NULL if the list is empty.
      2. Upon removing the last node, you should set *a_head to NULL.
    5. You must set the next field of the removed Node to NULL.
    6. The caller is responsible for freeing the detached node, and any memory it refers to.
      1. This function should not call free(…), directly or indirectly.
    7. You are effectively splitting a linked list into two lists.
    8. Hint: You should not call malloc(…) or free(…) in this function.
    print list(Node✶ head, void (✶print fn)(const void✶))
    return type: void
    1. head is the address of the first Node in the list.
    2. print_fn(…) is a function that prints your value *a_value in a useful format.
    3. Separate each element with a newline character.
    destroy list(Node✶✶ a head, void (✶destroy fn)(void✶))
    return type: void
    1. *a_head stores the address of the first Node in the list.
    2. destroy_fn(…) is a function that deallocates *a_value, as needed.
    3. Set the head to NULL in the caller's stack frame (i.e., *a_head = NULL;).
    binary tree functions (30%)
    bst insert(BSTNode✶✶ a root, void✶ a value, int(✶cmp fn)(const void✶, const void✶)))
    return type: void
    Add a new node with value a_value to a binary search tree, using function cmp_fn(…) to determine whether the value is stored on the left or right branch.
    1. *a_root refers to the root of the binary search tree.
    2. If *a_root is NULL, the tree is empty.
    3. a_value is the address of whatever value is associated with this node.
    4. cmp_fn(…) is a comparison function with the same semantics as the comparison function for qsort(…).
    5. The newly allocated iBSTNode should be inserted into the tree such that for any BSTNode in the tree, any BSTNode on the left contain a smaller (or equal) value and any BSTNode on the right contain a larger value (according to cmp_fn(…)).
    6. Hint: You will call malloc(…) exactly once. You will not call free(…) in this function.
    print tree(BSTNode✶ root, void (✶print fn)(const void✶))
    return type: void
    1. root stores the address of the uppermost BSTNode in the tree.
    2. print_fn(…) is a function that prints your value *a_value in a useful format.
    3. Elements should be printed in order from smallest to largest (based on the compare function). In other words, before printing the root, we print all values on the left. After printing the root, we print all the values on the right.
    4. Separate each element with a newline character.
    destroy tree(BSTNode✶✶ a root, void (✶destroy fn)(void✶))
    return type: void
    1. *a_root stores the address of the uppermost BSTNode in the tree.
    2. destroy_fn(…) is a function that deallocates *a_value, as needed.
    3. Set the root to NULL in the caller's stack frame (i.e., *a_root = NULL;).
    test_addresses.c functions (10%)
    main(int argc, char✶ argv[])
    return type: int
    Test all of the required functions in your addresses.c.
    • 100% code coverage is required.
    • Use miniunit.h.
  2. Note: The names of parameters in the function definition do not need to match the declaration. Don't worry if you see a parameter listed as a_head in one place and a_top in another.
  3. All helper functions should be declared as static unless directly tested in test_addresses.c.
  4. Only the following externally defined functions and constants are allowed in your .c files. (You may put the corresponding #include <…> statements in the .c file or in your addresses.h, at your option.)
    header functions/symbols allowed in…
    stdbool.h bool, true, false addresses.c, addresses.h, test_addresses.c
    assert.h assert addresses.c, test_addresses.c
    stdio.h printf, fprintf test_addresses.c
    stdout, FILE, fputc addresses.c, test_addresses.c
    stdlib.h malloc, free, NULL, size_t, EXIT_SUCCESS, EXIT_FAILURE addresses.c, test_addresses.c, addresses.h
    string.h strcmp, strlen, strcpy, strncpy test_addresses.c
    Feel free to suggest additional header files or functions that you would like to use.
  5. Submissions must meet the code quality standards and the course policies on homework and academic integrity.

Examples of cmp_fn(…)

We have provided examples of destroy_fn(…) functions. Here are some examples of cmp_fn(…) (for the pq_enqueue(…) function) and print_fn(…). If we store the addresses of int (i.e., int*) as the addresses of the generic type (void*) in .a_value, we can use the following comparison function:

static int int_cmp(const void* p1, const void* p2) {
   const int* a_n1 = p1;
   const int* a_n2 = p2;
   return *a_n1 - *a_n2;
}
Here, the addresses p1 and p2 have been typecast to const int* using an assignment statement, and dereferenced to obtain const int for comparison (with the difference being the result). If we store the addresses of char (i.e., char*) as the addresses of the generic type (void*) in .a_value, and each address of char is actually the address of the first character of string, we can use the following comparison function:
static int _string_cmp(const void* p1, const void* p2) {
   const char* str1 = p1;
   const char* str2 = p2;
   return strcmp(str1, str2);
}
Note there is no requirement that strings are always sorted alphabetically, the behavior is up to the caller. If we wished, we could instead sort them by length:
static int _string_length_cmp(const void* p1, const void* p2) {
   const char* str1 = p1;
   const char* str2 = p2;
   return strlen(str1) - strlen(str2);
}

Submit

To submit HW14 from within your hw14 directory, type

make submit

That should result in the following command: 264submit HW14 addresses.c test_addresses.c miniunit.h clog.h Makefile addresses.h warmup.c

Pre-tester

The pre-tester for HW14 has been released and is ready to use.

Q&A

  1. How much work is this?

    Assuming you successfully finished HW12, the main work is adapting your integer comparaisons to instead use the compare function, along with implementing the remove functions.

  2. What is the hardest part?

    The pq_enqueue(…) function or the bst_insert(…) is the most difficult.
  3. Can I make this easier?

    Yes! Use TDD to test and build in lockstep. Write a test. Then, write just enough implementation code to pass that test (and any prior tests).
    Do not try to code it all at once. Start by getting it working for the simplest possible test. This is probably adding a single node (i.e., adding a node to an empty list). Test that it works before you move on. Next, get it working and tested for a list of size two, where the items are assumed to be added in descending order (so that no reordering is necessary). Get it working before you move on.
    Once you get into the longer test cases that require placing the new node in the right position in the list, you should draw pictures. Walk through how the connections need to change on paper before attempting to do it in code.
  4. How do the compare, print, and destroy functions fit together with the rest of this code?

    Here's a code example.
    DO NOT COPY CODE FROM THIS FILE INTO YOUR OWN CODE. We are providing this only to clarify how the pieces are supposed to fit together. It would not be a good basis for your test_addresses.c. However, you may run this code in your account. It has been added to the starter code.
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include "huffman.h"
    
    static int _cmp_strings_by_length(const void* a_lhs, const void* a_rhs) {
        size_t len_lhs = strlen(a_lhs); // No typecast needed!  Never use typedcast unless:
        size_t len_rhs = strlen(a_rhs); // (a) necessary, (b) safe, and (c) well-understood.
        return len_lhs - len_rhs; // shortest string comes forst
    }
    
    static void _destroy_string(void* string) {
        free(string);
    }
    
    static void _print_string(const void* a_value) {
        // not every funciton allows typecasts in the parameters
        // we can always declare a local variable to handle the cast
        const char* str = a_value;
        printf("%s", str);
    }
    
    static char* _copy_string(const char* src_string) {
        size_t num_chars = strlen(src_string);
        char* dst_string = malloc(sizeof(*dst_string) * (num_chars + 1));
        return strcpy(dst_string, src_string);
    }
    
    int main(int argc, char* argv[]) {
        // Create a priority queue of four strings, ordered by length (ascending).
        Node* head = NULL;  // size 0
        pq_enqueue(&head, _copy_string("Reginald"), _cmp_strings_by_length);  // size 1
        pq_enqueue(&head, _copy_string("was"),      _cmp_strings_by_length);  // size 2
        pq_enqueue(&head, _copy_string("a"),        _cmp_strings_by_length);  // size 3
        pq_enqueue(&head, _copy_string("hatter"),   _cmp_strings_by_length);  // size 4
    
        // Print contents of the list:  a was hatter Reginald
        print_list(head, _print_string);
    
        // Alternate way to print contents of the list, not as generalized
        for(Node* curr = head; curr != NULL; curr = curr -> next) {
            printf("%s\n", (char*)(curr -> a_value));
        }
    
        // Deallocate all memory used by the list, including each node's payload.
        destroy_list(&head, _destroy_string);
    
        return EXIT_SUCCESS;
    }
    
    /* ______
     * Output
     * ‾‾‾‾‾‾
     * a
     * was
     * hatter
     * Reginald
     * a
     * was
     * hatter
     * Reginald
     */
    
    /* vim: set tabstop=4 shiftwidth=4 fileencoding=utf-8 noexpandtab: */
    
  5. Why are we not sorting arrays? Isn't this assignment the sequel to HW12?

    While arrays are one way to represent sorted data, not all sorted data structures are arrays. Additionally, it is quite difficult to index into an array of void as we do not know the size of a single element. qsort(…) solves this problem by taking an additional size parameter for the size of a single element. We could do something similar, but it adds an extra piece of complexity that is not nessesscary for the goal of this assignment.

Updates

7/26/2022
  • Extended stack push description and added requirements table.
7/27/2022
  • Added fputc as allowed in addresses.c.
  • Removed mentions of return types for push and enqueue that were removed this semester.

Updates may be posted here.