Advanced C Programming

Autumn 2015 :: ECE 264 :: Purdue University

This is for Fall 2015 (9 years ago)
Due 11/6

Binary search trees

In this assignment, you will create a program to index files by the words that they contain, and then print all of the files containing a certain word. We will provide the code for opening and reading the files. That code will be provided sometime this weekend, but you can complete and test the entire assignment without that.

Your job is to create a BST of strings, where each node contains a word, and a linked list of the filenames it appeared in (and of course the left and right node addresses).

The basic structure is below. For example, the word “write” appears in the files a.txt and c.txt.

data structure

Goals

The goals of this assignment are as follows:
  1. Learn how write code using binary search trees (a dynamic structure)
  2. Practice writing code that uses recursion

Warm-up exercises

The following warm-up exercises will help you prepare for this assignment.

  1. String compare function
    Write a program that takes two words on the command line and prints either "<", "=", or ">". For example, entering ./wusc ham sandwich on the command line would print "<".
    Your program should consist of a single file called wusc.c containing at least the following functions: main(…),
  2. BST of integers
    Write a program to do the following:
    1. create a binary search tree of integers: 9 7 0 3 1 6 6 4
    2. print all of the nodes in order, one number per line
    3. search the BST for the following numbers: 1 2 3 4 5
    4. free the BST
    Your program should consist of a single file called wubi.c containing at least the following functions: create(…), print(…), search(…), destroy(…) main(…),
  3. BST of strings
    Write a program to do the following:
    1. create a binary search tree of words: jam ham egg oat nut tea
    2. print all of the nodes in lexigraphic order (i.e., same as strcmp(…)), one word per line
    3. search the BST for the following words: jam bun pie tea
    4. free the BST
    Your program should consist of a single file called wubs.c containing at least the following functions: create(…), print(…), search(…), destroy(…) main(…),

The warm-up exercises comprise 20% of your score on this assignment.

The warm-ups will be only lightly checked. They should print out the output specified. Other output will be ignored. However, they must not have memory faults, and all code must be your own.

Opt out

Those who feel that they do not need this practice may "opt out" of any or all of the warm-ups by turning in in a simple program with the same filename that prints the following (exactly) in a single printf(…) statement.

I already know this sufficiently.

Your score for the warm-up will be equal to the score for the rest of the assignment, excluding any of the warm-ups. You may opt out of any number of the warm-ups (i.e., 0, 1, …, all of them). However, you will need to include the file for each one. You can simply copy the same file to all three warm-up filenames (i.e., wusc.c, wubi.c, wubi.h)

Files

Here are the files you will create for this assignment.

Unlike previous assignments, we will declare all struct types using typedef so that you don't need to include “struct ” before the name throughout your code. Your reference sheet contains examples (labelled as “PREFERRED”), which you may use.
file contents
index.h type definitions
Index
struct type (declared using typedef) with ≥1 field(s): root (IndexBSTNode*, root of a binary search tree); you may add other fields if you like, or this can be simply a struct containing a IndexBSTNode*
IndexBSTNode
struct type (declared using typedef) with =4 fields: word (string), filenames (StringListNode*), left (IndexBSTNode*), right (IndexBSTNode*)
StringListNode
=2 fields: filename (string), next (address of a StringListNode)
function declarations
one for each require function in index.c; do not include helper functions (if any) here
index.c function definition
create_index()
return type: Index
create an empty index
put(Index✶Xindex,Xchar✶Xword,Xchar✶Xfilename)
return type: void
add (word, filename) association to the list; if the word is already in the index then add the filename to the tail of the list of filenames containing that word but only if that filename is not already in the given word's list of filenames; otherwise add a new BST node with this word and just one filename node; do not add the same filename multiple times to the same word's node
get(Index✶Xindex,Xint✶Xcount,Xchar✶Xword)
return type: char✶✶
return an array of the filenames containing the given word; return the number of filenames by way of count; count is only used to return a value; its initial value should not be used; the caller is responsibible for freeing the memory of the array (the char**) but not the strings it refers to; if the word is not found then return NULL; if you are doing bonus #2, you will use a different function signature shown in the bonus box
free_index(Index✶Xindex)
return type: void
free all heap memory that the Index refers to (i.e., the BST, words, and filenames); you may assume this is run at the end of the caller's program
test_index.c function definitions
main(intXargc,Xchar✶Xargv[])
return type: int
test your functions in index.c; must be comprehensive and original; must have a return statement
test_index.txt text
expected output from running your test_index.c
As usual… Type all declarations into your program manually. Do not copy-paste. (You learn more from typing code manually.)

Bonus #1: No redundant allocations for filenames (2 points)

Do everything above, but allocate space for each filename only once.

To indicate that you are attempting bonus #1, you must include the following at the top of your index.h file.

#define HW08_BONUS_1

Bonus #2: Handle OR queries in your get(…) (2 points)

Do everything above, but your get(…) is a variadic function that can take multiple words. The result is the array of all filenames containing those words, with no duplicate filenames.

For bonus #2, you will use the following function signature

char** get(Index* index, int* count, char* word, ...)

The last argument must be NULL, just like in hw07.

To indicate that you are attempting bonus #2, you must include the following at the top of your index.h file.

#define HW08_BONUS_2

More detail about the bonus offers may be posted later.

Requirements

  1. Your submission must contain all of the following files with the functions and type definitions indicated in the table above: index.h index.c test_index.c test_index.txt wusc.c wubi.c wubs.c
  2. All data referenced by your Index (directly or indirectly) should be on the heap. Your Index itself may be on the stack.
  3. Make no assumptions about the number of distinct words or filenames.
  4. IndexBSTNode and StringListNode may not have any additional fields, other than those shown above.
  5. get(…) may not have any side effects. In other words, it may not modify the state of the Index that was passed to it.
  6. Only the following external header files, functions, and symbols are allowed. (All others are prohibited.)
    header functions/symbols allowed in…
    assert.h assert(…) index.c, index.h, test_index.c
    stdarg.h va_arg, va_list, va_copy, va_start, va_end index.c, index.h, test_index.c
    stdbool.h true, false index.c, index.h, test_index.c
    stdio.h fprintf(…), stdout test_index.c
    stdlib.h malloc(…), free(…) index.c, test_index.c
    string.h strlen(…), strcmp(…), strcpy(…) index.c, index.h, test_index.c
  7. As usual…
    1. Do not use dynamic arrays (C99 feature, e.g., char s[n];)
    2. Code must successfully compile and run with your test_index.c and any other valid test file.
    3. Code must meet all requirements in the Course Policies and Code Quality Standards.
      Note: If you choose to use assert(…), be sure you understand the specific guidelines that relate to it.
    4. All code must be your own.
    5. Your test code must be comprehensive (and original).

How much work is this?

This assignment is designed to be about half as difficult as hw06 . (Do not depend on this estimate. Your mileage may vary in either direction.)

As usual… Programming with dynamic memory usually entails some time debugging. Allow time for that.

Q&A

  1. Why are the warm-ups optional?
    Practicing with the concepts and technology has never been optional. However, I (the instructor) recognize that learning is a personal endeavor, and some may need more or less practice in each area. Also, I don't feel that assigning a lot of “busy work” would reflect the respect that I hold for students in this class.

    Part of maturing as a student and an engineer is taking personal responsibility for your learning. Instructors can pull you along through a body of knowledge using required readings and “busy-work” assignments, but those will be assigned to the entire class at once. The “learner” is you. As such, you are the best person to decide how much practice you need in each area. If you don't have the opportunity to decide for yourself, it will be much harder to ever learn to do this. Warm-ups are optional because I believe the benefits of this opportunity will outweigh the risks (i.e., someone under-practicing and doing poorly on an assignment).

  2. Will put(…) called for every word in every file?
    Yes.
  3. So what is the Index, really?
    In the simplest case, it can be a struct with one field, the root of a BST.

    Here is the simplest possible Index.
    typedef struct {
        IndexBSTNode* root;
    } Index;
    Here is the simplest possible create_index(…).
    Index create_index() {
        Index index = { .root = NULL };
        return index;
    }
    You may use/copy these, if you like.
  4. How does strcmp(…) work?
    From bash, type man strcmp for details. Type q to get out.
  5. Can we use helper functions?
    Yes. Just don't include them in your index.h. That is a general principle. Your .h file is like a guide for others who might want to use your code in their program. Since helper functions are, by definition, only for internal use within your code, they should not be declared in your .h file. Instead, you can (and probably should) declare them at the top of your index.c file.

Updates

10/31: Q2 and Q3 were added to the Q&A; typo was fixed in the first warm-up ("ham" < "sandwich"); minor clarifications to warm-ups

11/1: Function signature of get(…) corrected to include the address of your Index and to account for the bonus #2 option; correction to Q3 in the Q&A; clarifications about how warm-ups will be checked

11/2: Removed erroneous * from the name of the IndexBSTNode and StringListNode types. for bonus #2 last argument must be NULL; corrected contents of the Index struct type; get(…) returns NULL if the word was not found; put(…) should not add duplicate filenames to the same word's node; the Index struct type's primary field should be called root

11/4: free_index(…) frees the heap memory referred to by the Index, but not the Index itself.