Split string
Learning goals
In this assignment will practice some of the same skills you learned in HW08. Specifically, you will:
- Use
malloc(…)
to allocate buffers on the heap. - Write bug-resistant code that uses
malloc(…)
. - Debug common problems that occur when using dynamic memory.
In addition, you will learn how to:
- Define a struct type.
- Use struct objects on the stack and on the heap.
- Free memory referred to by a struct object.
- Free elements of an array.
- Use
const
to avoid inadvertently modifying data that was not supposed to change.
Overview
In HW08, you wrote code to join an array of strings with some delimiter between each string. This assignment is (mostly) the reverse operation, with a twist, which we will get to.
You will create a function
split_string(…)
, which splits a string according to
one or more delimiters. (That's the twist.)
Example: Splitting the string "ABC--DEF;GHI=JKL"
by the
delimiters "--"
, ";"
, and "="
would produce the array
{"ABC", "--", "DEF", ";", "GHI", "=", "JKL"}
.
Why would we do split by multiple delimiters?
Originally, the idea for this assignment was to have you take a paragraph of text as input,
and split it into sentences (i.e., by '.'
, '?'
, and '!'
),
and then split each sentence into words. The first step of that would be
to split the text by '.'
, '?'
, and '!'
.
As we were writing the solution and description for that assignment, we started by writing up
a support function,
split_string(…)
, which would split the paragraph text by
any delimiters you specify (i.e., '.'
, '?'
, and '!'
). It was getting complicated
when we got to how to deal with varying amounts of
whitespace, as well as the possibility of multiple consecutive delimiters.
Since split_string(…)
is the logical reverse of
join_strings(…)
, we decided to shorten the assignment to
just this.
This isn't a crazy thing to do in any case. For example, Python allows you to split a
string by a regular expression (pattern), which effectively allows you to split by multiple
delimiters (ex: re.split(r'[.!?]', 'No! But why? Because.')
).
It can be useful for… yeah, you guess it!… splitting a paragraph by sentences (among many
other things).
Getting Started
Get the starter code
you@eceprog ~/264/
$
264get hw09
you@eceprog ~/264/
$
cd hw09
you@eceprog ~/264/hw09
$
Copy your log_macros.h, miniunit.h, and Makefile from previous assignments.
you@eceprog ~/264/hw09
$
cp ../hw08/log_macros.h ./
you@eceprog ~/264/hw09
$
cp ../hw08/miniunit.h ./
you@eceprog ~/264/hw09
$
cp ../hw08/Makefile ./
you@eceprog ~/264/hw09
$
Update your Makefile
you@eceprog ~/264/hw09
$
vim Makefile
Change ASG_NICKNAME
to HW09
and
BASE_NAME
to split_string
.
ASG_NICKNAME = HW09
BASE_NAME = split_string
…
Make sure split_string.h will be submitted when you type make submit
.
Your SUBMIT_FILES
variable might vary in how it uses the other variables. Just make
sure that when all variables have been expanded, SUBMIT_FILES includes all of the
files that must be submitted with this assignment.
If your Makefile looks like the one in the Q&A for HW08, you could probably just change
SUBMIT_FILES = $(ALL_C_FILES) $(MORE_H_FILES) Makefile
… to …
SUBMIT_FILES = $(ALL_C_FILES) $(ALL_H_FILES) Makefile
⚠
In previous assignments, submitting the main header
file (e.g., split_string.h) was not required. This time, it is.
Make sure your Makefile includes all required files when you submit
(make submit
)—including split_string.h.
Requirements
-
Follow the following development steps, in order—and submit after every step.
You may submit more often, but be sure to have at least one submission for each
step below. You cannot go back. There will be a penalty if you do not follow this.
- Fill in the two struct type definitions in split_string.h. Check for errors by compiling and running try_struct_type_definitions.c. Submit
-
Add a test for
copy_substring(…)
in test_split_string.c. Implement just enough in split_string.c to make that test pass. Submit -
Add a test for
free_strings(…)
. It will need to create an array of strings on the heap. Feel free to reuse your code from your HW08 and/or the try_struct_type_definitions.c in the HW09 starter code. Implement just enough in split_string.c to make your new test pass. Submit -
Add a test for
find_substring(…)
in test_split_string.c. Implement just enough in split_string.c to make that test pass. Hint:strstr(…)
. Google it orman strstr
. Submit -
Add a test for
split_string(…)
in test_split_string.c. Implement just enough in split_string.c to make that test pass. Submit
- Your submission must contain each of the following files, as specified:
file contents split_string.h struct types struct StringsDefine a struct type calledstruct Strings
with two fields:.strings
(char**
) and.num_strings
(size_t
).struct FindResultDefine a struct type calledstruct FindResult
with two fields:.needle
(char const*
) and.idx
(size_t
).function declarations The bottom of the file contains declarations for the functions that are required for split_string.c. Do not modify those declarations (unless directed in writing by the instructor). split_string.c function definitions find substring(char const✶✶ a pos, struct Strings needles)
→ return type: struct FindResultFind the first occurrence of any of the strings in needles within the string beginning at*a_pos
.- a_pos is the address of a
char*
indicating where to begin searching. If we call the full string being searched the haystack,*a_pos
could be the beginning of the haystack or some address in the middle—wherever we want to begin searching. - If at least one of the needles is found within the string
at
*a_pos
…- Return a
struct FindResult
object of which the.idx
field indicates the index within the string at*a_pos
where the first needle was found, and the.needle
is one of the strings within needles. - Set
*a_pos
to the address of the character immediately after the first needle found. - In case an of the needles overlap, the one with the earlier index takes
priority. For example, when searching
"abcdefghi"
for the needles{"cde", "cd", "hi"}
, the.needle
field of the returned object would be"cde"
because it comes first in the array of needles. find_substring(…)
must not result in any calls tomalloc(…)
.
- Return a
- If none of the needles is found…
-
Return a
struct FindResult
object of which the.idx
field is 0 and the.needle
field isNULL
. - Do not modify
*a_pos
.
-
Return a
-
This is a support function. It is included in this assignment
to guide you in implementing
split_string(…)
, though it could certainly be useful on its own.
copy substring(char const✶ src string, size t substring len)
→ return type: char✶Create a string on the heap containing the first (up to) substring_len characters from src_string.- Each call to
copy_substring(…)
should result in one call tomalloc(…)
. - Caller is responsible for freeing the heap memory buffer allocated by this function.
- ⚠ Do not call
free(…)
in this function.
- ⚠ Do not call
free strings(struct Strings✶ a strings)
*a_strings
.- The purpose of
free_strings(…)
is to free the result that will be returned bysplit_string(…)
. - This must free each of the strings
(i.e.,
(*a_strings).strings[0]
, etc.) as well as the outer array (i.e.,(*a_strings).strings
). - Calling
free_strings(…)
should not result in any calls tomalloc(…)
split string(char const✶ text, struct Strings delimiters)
→ return type: struct StringsSplit the stringtext
by each non-overlapping occurrence of any of the strings indelimiters
.- Return an array of strings (
char*
) on the heap, including the delimiters. Each string in this array (including the delimiters) should be newly created (malloc'd). - Each call to
split_string(…)
should result in n + 1 calls to malloc, where n is the number of parts in the resulting array. The +1 is for the array of strings (i.e., the outer array). - In case a delimiter falls at the beginning or end of
text
, and/or consecutive occurrences of delimiters, the returned array will contain one or more empty strings. - Caller is responsible for freeing the heap memory buffer allocated by this function.
- ⚠ Do not call
free(…)
in this function.
- ⚠ Do not call
test_split_string.c functions main(int argc, char✶ argv[])
→ return type: intTest the code in your split_string.c using your miniunit.h.- 100% code coverage is required for split_string.c.
test ▒▒▒()
→ return type: int- Use your
mu_check_strings_equal(…)
from HW06 to check that the code in your split_string.c is working correctly.
test ▒▒▒()
→ return type: inttest ▒▒▒()
→ return type: intminiunit.h macros Same as HW06. Okay to modify, if you wish. log_macros.h macros Same as HW05. Okay to modify, if you wish. Makefile macros Same as HW07. Okay to modify, if you wish. - a_pos is the address of a
- ⚠ Do not use typecasts anywhere in HW09. The code quality standard says typecasts may not be used, except when they are truly necessary, safe, and you can articulate why. They are not necessary for anything in HW09.
-
Only the following external header files, functions, and symbols are allowed in split_string.c.
header functions/symbols allowed in… stdbool.h bool
,true
,false
*
stdio.h *
test_split_string.c
assert.h assert
*
stdlib.h EXIT_SUCCESS
test_split_string.c
stdlib.h malloc
,free
*
string.h strlen
,strcpy
,strncpy
,strcmp
,strstr
,strncmp
,memcpy
,memmove
*
-
⚠ Do not use
strtok(…)
(function from C standard libary). - Submissions must meet the code quality standards and the policies on homework and academic integrity.
Submit
To submit HW09 from within your hw09 directory, type
make submit
That should result in the following command:
264submit HW09 split_string.c test_split_string.c split_string.h miniunit.h log_macros.h Makefile
Pre-tester ●
The pre-tester for HW09 has been released and is ready to use.
Q&A
Updates
There are no updates to HW09 so far.