Advanced C Programming

Autumn 2016 :: ECE 264 :: Purdue University

This is for Fall 2016 (last year)
Due 11/1

Binary search trees

Goals

The goals of this assignment are as follows:
  1. Learn how write code using binary search trees (a dynamic structure)
  2. Practice writing code that uses recursion

Overview

In this assignment, you will create a program to index files by the words that they contain, and then print all of the files containing a certain word. We provide all of the code you need for opening and reading the files.

Your job is to create a binary search tree (BST) of strings, where each node contains a word, and a linked list of the filenames it appeared in (and of course the left and right node addresses).

The basic structure is below. For example, the word “write” appears in the files a.txt and c.txt.

data structure

To be clear, you will not be writing any code to access files yourself. That is simply the premise of this assignment. We do provide some test code (indexer.c) that does access files, but you don't have to modify it in any way. indexer.c is described below.

Getting started

To get the starter files, type this: 264get hw11

Warm-up

This assignment includes a warm-up exercise to help you get ready. This accounts for 15% of your score for HW11. Scoring will be relatively light, but the usual base requirements apply.

  1. String compare function
    Write a program that takes two words on the command line and prints either "<", "=", or ">". For example, entering ./warmup wusc ham sandwich on the command line would print "<".
  2. BST of integers
    Write a program to do the following:
    1. create a binary search tree of integers: 9 7 0 3 1 6 6 4
    2. print all of the nodes in order, one number per line
    3. search the BST for the following numbers: 1 2 3 4 5. Print only the numbers that are found, one per line.
    4. free the BST
  3. BST of strings
    Write a program to do the following:
    1. create a binary search tree of words: jam ham egg oat nut tea
    2. print all of the nodes in lexigraphic order (i.e., same as strcmp(…)), one word per line
    3. search the BST for the following words: jam bun pie tea. Print only the strings that are found, one per line.
    4. free the BST

The structure of the warmup.c file is described in the Requirements table below. A starter warmup.c is provided, including suggestions for helper functions you may want to use, in addition to the required functions. You do not have to use those, and you may add other helper functions, if you wish.

Opt out.

In a hurry, and don't need the practice? This warm-up is here to help you learn what you need to succeed on the rest of this assignment—not to add additional work. Therefore, we give you an option. Those who feel that they do not need the practice may "opt out". by modifying warmup.c so that it does nothing but print the following message exactly and then exit:

I already know this and do not need to practice.

If you do that, then your score for HW11 will be based solely on the rest of this assignment. If you leave the warmup.c undone, if you do not turn in a warmup.c, or if the message it prints does not match perfectly, then you will receive 0 for the warmup portion of this assignment (15%).

Test code: indexer.c

Included in the starter files is indexer.c, a program that you can use to test your code. It reads one or more files in the current directory and uses your create_index() and put(…) to create an index of the words in each of the files. Then, it searches for a given word and tells you which files contain it.

The usage is as follows: ./indexer WORD FILE1 FILE2

For example, in the current homework directory, if you compile indexer.c with your code (gcc index.c indexer.c -o indexer) and then run ./indexer stdio index.c indexer.c warmup.c it should print the following (or something similar):

3 files contain "stdio"
- index.c
- indexer.c
- warmup.c

Requirements

Unlike previous assignments, you will declare all struct types using typedef so that you don't need to include “struct” before the name throughout your code. Your reference sheet contains examples (labelled as “Concise syntax” or “PREFERRED”), which you may use.
  1. Your submission must contain each of the following files, as specified:
  2. file contents
    index.c functions
    create index()
    return type: Index
    create an empty index.
    put(Index✶ index,char✶ word,char✶ filename)
    return type: void
    Add (word,filename) assocaition to the list.
    • If the word is already in the index then add the filename (if not already present) to the tail of the list of filenames containing that word.
    • Otherwise add a new BST node with this word and just one filename node
    • Do not add the same filename multiple times to the same word's node.
    get(Index✶ index,int✶ count,char✶ word)
    return type: char✶✶
    Get the filenames containing the given word.
    • Return an array of strings: every filename containing the word.
    • Return the number of filenames (i.e., size of the array) via count.
    • The initial value of count should not be used. It is used only to return a value.
    • The caller is responsibible for freeing the memory of the array (char**), but not the strings it refers to.
    • If the word is not found then return NULL
    • If you are doing bonus #2, you will use a different function signature shown in the bonus box.
    free index(Index✶ index)
    return type: void
    Free all heap memory that the Index refers to (i.e., the BST, words, and filenames)
    • You may assume this is run at the end of the caller's program.
    index.h type definitions
    Index
    struct type (declared using typedef) with ≥1 fields: root (IndexBSTNode*)
    • You may add other fields if you like, or this can be simply a struct containing a IndexBSTNode*
    IndexBSTNode
    struct type (declared using typedef) with 4 fields: word(string), filenames (StringListNode*), left (IndexBSTNode*), right (IndexBSTNode*)
    StringListNode
    struct type (declared using typedef) with 2 fields: filename (string), next (address of a StringListNode)
    forward declarations
    forward declarations of all functions in your index.c
    test_index.c functions
    main(int argc, char✶ argv[])
    return type: int
    Test all of the code in your index.c.
    expected.txt functions Expected output from running your test_index.c
    warmup.c functions
    main(int argc, char✶ argv[])
    Run one of the warm-up exercises, depending on the first command-line argument. This function is provided in the starter file, but the behavior is as follows:
    • Calling as ./code wusc s1 s2 should print “<”, “=”, or “>” depending on the string comparison of s1 with s2, as described in the warm-up titled String compare function.
    • Calling as ./code wubi should perform the steps for the warm-up titled BST of integers.
    • Calling as ./code wubs should perform the steps for the warm-up titled BST of strings.
    • Calling with any other arguments or the wrong number of arguments should print an error message, “Invalid arguments” and return EXIT_FAILURE from main(…)
    string compare(const char✶ string1, const char✶ string2)
    return type: char
    Compare two strings and return “<”, “=”, or “>”, depending on the result.
    • This should behave just like strcmp(…), except that this returns a character (“<”, “=”, or “>”).
    create bst of int(int✶ array of ints, size t count)
    return type: BSTofIntNode✶
    Create a BST containing the given integers and return the root.
    create bst of strings(char✶✶ array of strings, size t count)
    return type: BSTofStringsNode✶
    Create a BST containing the given strings and return the root.
    • All strings should be copied. You may use _strdup(…), if you like.
    print bst of int(BSTofIntNode✶ root)
    return type: void
    Print every value in numerical order (in order traversal).
    print bst of strings(BSTofStringsNode✶ root)
    return type: void
    Print every value in lexigraphic order (in order traversal).
    search bst of int(BSTofIntNode✶ root, int value)
    return type: bool
    Return true if value was found; otherwise return false
    search bst of strings(BSTofStringsNode✶ root, char✶ value)
    return type: bool
    Return true if value was found; otherwise return false
    destroy bst of int(BSTofIntNode✶ root)
    return type: void
    Free the heap memory occupied by this BST.
    destroy bst of strings(BSTofStringsNode✶ root)
    return type: void
    Free the heap memory occupied by this BST, including the copied strings.
    types
    BSTofIntNode
    • You define the fields. (Hint: An example is on your reference sheet.)
    BSTofStringsNode
    • You define the fields. (Hint: An example is on your reference sheet.)
  3. Only the following externally defined functions and constants are allowed in your .c files. (You may put the corresponding #include <…> statements in the .c file or in your index.h, at your option.)
    header functions/symbols allowed in…
    stdbool.h bool, true, false index.c, index.h, test_index.c, warmup.c
    stdio.h printf, fprintf, stdout, FILE test_index.c, warmup.c
    stdlib.h malloc, free, NULL, EXIT_SUCCESS, EXIT_FAILURE index.c, test_index.c, warmup.c
    string.h strlen, strcmp, strcpy index.c, test_index.c, warmup.c
    assert.h assert index.c, test_index.c, warmup.c
  4. All data referenced by your Index (directly or indirectly) should be on the heap. Your Index itself may be on the stack.
  5. Make no assumptions about the number of distinct words or filenames.
  6. IndexBSTNode and StringListNode may not have any additional fields, other than those shown above.
  7. get(…) may not have any side effects. In other words, it may not modify the state of the Index that was passed to it.
  8. Submissions must meet the code quality standards and the policies on homework and academic integrity.
As usual… Type declarations into your code manually. Do not copy-paste. (You learn more this way.)

Bonus #1: No redundant allocations for filenames (2 points)

Do everything above, but allocate space for each filename only once.

To indicate that you are attempting bonus #1, you must include the following (exactly) at the top of your index.h file.

#define HW11_BONUS_1

Bonus #2: Handle OR queries in your get(…) (2 points)

Do everything above, but your get(…) is a variadic function that can take multiple words. The result is the array of all filenames containing those words, with no duplicate filenames.

For bonus #2, you will use the following function signature

char** get(Index* index, int* count, char* word, ...)

The last argument must be NULL, just like in HW10.

To indicate that you are attempting bonus #2, you must include the following (exactly) at the top of your index.h file.

#define HW11_BONUS_2

For bonus #2, you may #include <stdarg.h> and use va_arg(…), va_copy, va_start(…), va_end(…), and va_list.

How much work is this?

Those who finished the linked list assignments (HW08, HW09, HW10) and understood them well will most likely find HW11 easier. As always, your mileage may vary. In particular, programming with dynamic memory usually entails some time debugging. If dynamic structures still feel difficult to you, don't worry. It usually takes time to get comfortable with it.

Aim to start and finish this assignment early. Then, you can do one or both of the bonuses, if you like.

Q&A

  1. Will put(…) be called for every word in every file?
    Yes.
  2. So what is the Index, really? … and what does create_index() do?
    In the simplest case, it can be a struct with one field, the root of a BST.

    Here is the simplest possible Index.
    typedef struct {
        IndexBSTNode* root;
    } Index;
    Here is the simplest possible create_index(…).
    Index create_index() {
        Index index = { .root = NULL };
        return index;
    }
    You may use/copy these, if you like.
  3. How does strcmp(…) work?
    From bash, type man strcmp for details. Type q to get out.
  4. Can we use helper functions?
    Yes. Just don't include them in your index.h. That is a general principle. Your .h file is like a guide for others who might want to use your code in their program. Since helper functions are, by definition, only for internal use within your code, they should not be declared in your .h file. Instead, you can (and probably should) declare them at the top of your index.c file.
  5. What if the caller calls put(…) with the same word and filename twice?
    From the requirements table: “Do not add the same filename multiple times to the same word's node.”

Updates

10/28/2016 Clarified that indexer.c uses both create_index() and put(…) to generate an index; modified title of Q2 to grab the attention of people wondering about create_index() (no change to answer); added Q5
10/29/2016 Corrected signature of search_bst_of_strings(…) and destroy_bst_of_strings(…)