Binary search trees
Goals
- Learn how write code using binary search trees (a dynamic structure)
- Practice writing code that uses recursion
Overview
In this assignment, you will create a program to index files by the words that they contain, and then print all of the files containing a certain word. We provide all of the code you need for opening and reading the files.
Your job is to create a binary search tree (BST) of strings, where each node contains a word, and a linked list of the filenames it appeared in (and of course the left and right node addresses).
The basic structure is below. For example, the word “write” appears in the files a.txt and c.txt.

To be clear, you will not be writing any code to access files yourself. That is simply the premise of this assignment. We do provide some test code (indexer.c) that does access files, but you don't have to modify it in any way. indexer.c is described below.
Getting started
To get the starter files, type this: 264get hw11
Warm-up
This assignment includes a warm-up exercise to help you get ready. This accounts for 15% of your score for HW11. Scoring will be relatively light, but the usual base requirements apply.
-
String compare function
Write a program that takes two words on the command line and prints either "<", "=", or ">". For example, entering./warmup wusc ham sandwich
on the command line would print "<". -
BST of integers
Write a program to do the following:- create a binary search tree of integers: 9 7 0 3 1 6 6 4
- print all of the nodes in order, one number per line
- search the BST for the following numbers: 1 2 3 4 5. Print only the numbers that are found, one per line.
- free the BST
-
BST of strings
Write a program to do the following:- create a binary search tree of words: jam ham egg oat nut tea
- print all of the nodes in lexigraphic order (i.e., same as
strcmp(…)
), one word per line - search the BST for the following words: jam bun pie tea. Print only the strings that are found, one per line.
- free the BST
The structure of the warmup.c file is described in the Requirements table below. A starter warmup.c is provided, including suggestions for helper functions you may want to use, in addition to the required functions. You do not have to use those, and you may add other helper functions, if you wish.
Opt out.
In a hurry, and don't need the practice? This warm-up is here to help you learn what you need to succeed on the rest of this assignment—not to add additional work. Therefore, we give you an option. Those who feel that they do not need the practice may "opt out". by modifying warmup.c so that it does nothing but print the following message exactly and then exit:
I already know this and do not need to practice.
If you do that, then your score for HW11 will be based solely on the rest of this assignment. If you leave the warmup.c undone, if you do not turn in a warmup.c, or if the message it prints does not match perfectly, then you will receive 0 for the warmup portion of this assignment (15%).
Test code: indexer.c
Included in the starter files is indexer.c, a program that you can use to test your
code. It reads one or more files in the current directory and uses your
create_index()
and put(…)
to create an index of the words in each of the files. Then, it searches for a given word
and tells you which files contain it.
The usage is as follows: ./indexer WORD FILE1 FILE2 …
For example, in the current homework directory, if you compile indexer.c with
your code (gcc index.c indexer.c -o indexer
) and
then run ./indexer stdio index.c indexer.c warmup.c
it should print the
following (or something similar):
3 files contain "stdio" - index.c - indexer.c - warmup.c
Requirements
Unlike previous assignments, you will declare all struct types usingtypedef
so that you don't need to include “struct
” before the name throughout your code. Your reference sheet contains examples (labelled as “Concise syntax” or “PREFERRED”), which you may use.
- Your submission must contain each of the following files, as specified:
- If the word is already in the index then add the filename (if not already present) to the tail of the list of filenames containing that word.
- Otherwise add a new BST node with this word and just one filename node
- Do not add the same filename multiple times to the same word's node.
- Return an array of strings: every filename containing the word.
- Return the number of filenames (i.e., size of the array) via
count
. - The initial value of
count
should not be used. It is used only to return a value. - The caller is responsibible for freeing the memory of the array (
char**
), but not the strings it refers to. - If the word is not found then return NULL
- If you are doing bonus #2, you will use a different function signature shown in the bonus box.
- You may assume this is run at the end of the caller's program.
- You may add other fields if you like, or this can be simply a struct containing a IndexBSTNode*
- Calling as
./code wusc s1 s2
should print “<”, “=”, or “>” depending on the string comparison of s1 with s2, as described in the warm-up titled String compare function. - Calling as
./code wubi
should perform the steps for the warm-up titled BST of integers. - Calling as
./code wubs
should perform the steps for the warm-up titled BST of strings. - Calling with any other arguments or the wrong number of arguments should print an error message, “Invalid arguments” and return
EXIT_FAILURE
frommain(…)
- This should behave just like
strcmp(…)
, except that this returns a character (“<”, “=”, or “>”). - All strings should be copied. You may use
_strdup(…)
, if you like. - You define the fields. (Hint: An example is on your reference sheet.)
- You define the fields. (Hint: An example is on your reference sheet.)
-
Only the following externally defined functions and constants are allowed in your .c files. (You may put the corresponding #include <…> statements in the .c file or in your index.h, at your option.)
header functions/symbols allowed in… stdbool.h bool
,true
,false
index.c
,index.h
,test_index.c
,warmup.c
stdio.h printf
,fprintf
,stdout
,FILE
test_index.c
,warmup.c
stdlib.h malloc
,free
,NULL
,EXIT_SUCCESS
,EXIT_FAILURE
index.c
,test_index.c
,warmup.c
string.h strlen
,strcmp
,strcpy
index.c
,test_index.c
,warmup.c
assert.h assert
index.c
,test_index.c
,warmup.c
-
All data referenced by your
Index
(directly or indirectly) should be on the heap. YourIndex
itself may be on the stack. - Make no assumptions about the number of distinct words or filenames.
-
IndexBSTNode
andStringListNode
may not have any additional fields, other than those shown above. -
get(…)
may not have any side effects. In other words, it may not modify the state of theIndex
that was passed to it. - Submissions must meet the code quality standards and the policies on homework and academic integrity.
file | contents | |
---|---|---|
index.c | functions |
create index()
→ return type: Index
create an empty index.
|
put(Index✶ index,char✶ word,char✶ filename)
→ return type: void
Add (word ,filename ) assocaition to the list.
|
||
get(Index✶ index,int✶ count,char✶ word)
→ return type: char✶✶
Get the filenames containing the given word.
|
||
free index(Index✶ index)
→ return type: void
Free all heap memory that the Index refers to (i.e., the BST, words, and filenames) |
||
index.h | type definitions |
Index
struct type (declared using typedef) with ≥1 fields: root (IndexBSTNode*)
|
IndexBSTNode
struct type (declared using typedef) with 4 fields: word (string), filenames (StringListNode* ), left (IndexBSTNode* ), right (IndexBSTNode* )
|
||
StringListNode
struct type (declared using typedef) with 2 fields: filename (string), next (address of a StringListNode )
|
||
forward declarations |
forward declarations of all functions in your index.c
|
|
test_index.c | functions |
main(int argc, char✶ argv[])
→ return type: int
Test all of the code in your index.c.
|
expected.txt | functions | Expected output from running your test_index.c |
warmup.c | functions |
main(int argc, char✶ argv[])
|
string compare(const char✶ string1, const char✶ string2)
→ return type: char
Compare two strings and return “<”, “=”, or “>”, depending on the result.
|
||
create bst of int(int✶ array of ints, size t count)
→ return type: BSTofIntNode✶
Create a BST containing the given integers and return the root.
|
||
create bst of strings(char✶✶ array of strings, size t count)
→ return type: BSTofStringsNode✶
Create a BST containing the given strings and return the root.
|
||
print bst of int(BSTofIntNode✶ root)
→ return type: void
Print every value in numerical order (in order traversal).
|
||
print bst of strings(BSTofStringsNode✶ root)
→ return type: void
Print every value in lexigraphic order (in order traversal).
|
||
search bst of int(BSTofIntNode✶ root, int value)
→ return type: bool
Return true if value was found; otherwise return false
|
||
search bst of strings(BSTofStringsNode✶ root, char✶ value)
→ return type: bool
Return true if value was found; otherwise return false
|
||
destroy bst of int(BSTofIntNode✶ root)
→ return type: void
Free the heap memory occupied by this BST.
|
||
destroy bst of strings(BSTofStringsNode✶ root)
→ return type: void
Free the heap memory occupied by this BST, including the copied strings.
|
||
types |
BSTofIntNode
|
|
BSTofStringsNode
|
As usual… Type declarations into your code manually. Do not copy-paste. (You learn more this way.)
Bonus #1: No redundant allocations for filenames (2 points)
Do everything above, but allocate space for each filename only once.
To indicate that you are attempting bonus #1, you must include the following (exactly) at the top of your index.h file.
#define HW11_BONUS_1
Bonus #2: Handle OR queries in your
get(…)
(2 points)
Do everything above, but your get(…)
is
a variadic function that can take multiple words. The result is
the array of all filenames containing those words, with no duplicate filenames.
For bonus #2, you will use the following function signature
char** get(Index* index, int* count, char* word, ...)
The last argument must be NULL, just like in HW10.
To indicate that you are attempting bonus #2, you must include the following (exactly) at the top of your index.h file.
#define HW11_BONUS_2
For bonus #2, you may #include <stdarg.h>
and use va_arg(…)
,
va_copy
, va_start(…)
, va_end(…)
, and
va_list
.
How much work is this?
Those who finished the linked list assignments (HW08, HW09, HW10) and understood them well will most likely find HW11 easier. As always, your mileage may vary. In particular, programming with dynamic memory usually entails some time debugging. If dynamic structures still feel difficult to you, don't worry. It usually takes time to get comfortable with it.
Aim to start and finish this assignment early. Then, you can do one or both of the bonuses, if you like.
Q&A
-
Will
put(…)
be called for every word in every file?
Yes. -
So what is the
Index
, really? … and what doescreate_index()
do?
In the simplest case, it can be a struct with one field, the root of a BST.
Here is the simplest possible Index.typedef struct { IndexBSTNode* root; } Index;
Here is the simplest possiblecreate_index(…)
.Index create_index() { Index index = { .root = NULL }; return index; }
You may use/copy these, if you like. -
How does
strcmp(…)
work?
From bash, typeman strcmp
for details. Type q to get out. -
Can we use helper functions?
Yes. Just don't include them in your index.h. That is a general principle. Your .h file is like a guide for others who might want to use your code in their program. Since helper functions are, by definition, only for internal use within your code, they should not be declared in your .h file. Instead, you can (and probably should) declare them at the top of your index.c file. -
What if the caller calls
put(…)
with the same word and filename twice?
From the requirements table: “Do not add the same filename multiple times to the same word's node.”
Updates
10/28/2016 | Clarified that indexer.c uses both create_index() and put(…) to generate an index; modified title of Q2 to grab the attention of people wondering about create_index() (no change to answer); added Q5 |
10/29/2016 | Corrected signature of search_bst_of_strings(…) and
destroy_bst_of_strings(…) |