Advanced C Programming

Fall 2022 ECE 264 :: Purdue University :: Section 3 (Quinn)

⚠ This is a PAST SEMESTER (Fall 2022).
Due 10/14

malloc(…): Join strings

Learning goals

You should learn how to:

  1. Use malloc(…) to allocate buffers on the heap.
  2. Write bug-resistant code that uses malloc(…).
  3. Debug common problems that occur when using dynamic memory.

Overview

In this assignment, you will create a few functions for manipulating strings. While these functions are operations that are genuinely useful in some contexts, the real purpose of this assignment is to give you some practice using dynamic memory, starting with something simple (copying a string to the heap) and moving up to something a little fancier (joining several strings with a separator string).

Dynamic memory (aka “heap memory”)

As we discussed in class, memory is organized in segments.

The stack segment is where local variables and arguments are stored. One limitation of the stack is that you have to know in advance how big any array, string, or object will be. If your code will take input from a file, a user, the network, or external code that you didn't write, then you can't predict how big things need to be. Another limitation is that when a function returns, all local variables and arguments are invalidated (i.e., become unavailable).

The heap segment lets you specify exactly how many bytes you need. You use the malloc(…) function to allocate (reserve) that space. You allocate a buffer (a bunch of bytes) into which you can store a string, array, or any data you like. Your buffer says allocated until you explicitly deallocate it by calling the free(…) function.

Example: Make a string on the heap with the letters, "abc".

// Declare a variable abc_string, allocate a new buffer on the heap sufficient to
// store 4 char's, and initialize abc_string to the address of the newly allocated buffer.
char* abc_string = malloc(sizeof(*abc_string) * 4);

// Populate the new string (i.e., store letters in it).
abc_string[0] = 'a';
abc_string[1] = 'b';
abc_string[2] = 'c';
abc_string[3] = '';

// Print the string.
log_str( abc_string );  // prints:  abc_string == "abc"

// Deallocate (aka "free") the heap memory buffer at address abc_string.
free( abc_string );

The malloc(…) function takes a number of bytes as its parameter and returns the address of a newly allocated buffer on the heap segment. When calling malloc(…), we never pass a raw number. We always calculate the number of bytes dynamically using an expression that includes the sizeof(…) operator.

sizeof(*abc_string) evaluates to the number of bytes required to store a value of the same type as *abc_string. Since abc_string is a char*, *abc_string is a char. Therefore, sizeof(*abc_string) gives the number of bytes necessary to store a single char. Multiplying that by 4 (i.e., sizeof(*abc_string) * 4) gives the number of bytes necessary to store four char's.

Pitfalls

Do not use a plain integer constant with malloc(…).

Specify the number of bytes to allocate as an expression using sizeof(…).

char* abc_string = malloc(4);
char* abc_string = malloc(sizeof(*abc_string) * 4);

Do not use a type with sizeof(…)

The C language also allows another form of the sizeof(…) operator: sizeof(TYPE) (← BAD!). Do not use this form. Some people might write the above malloc call as malloc(sizeof(char) * 4) (← BAD!). That form is outdated because it is bug-prone. Don't use it. If you later change the type of the type on the left, you may end up allocating the wrong amount of memory. That can cause serious bugs with security consequences. To avoid those problems, whenever you call malloc(…), pass an expression using sizeof(…) with an expression, not a type.

char* abc_string = malloc(sizeof(int) * 4);
char* abc_string = malloc(sizeof(*abc_string) * 4);

Don't forget the asterisk ("*") in the sizeof(…) expression.

If you forget the asterisk in your sizeof(…) expression, you may allocate the wrong amount of memory, leading to bufffer overflow errors which can be challenging to find. Remember your asterisk!

char* abc_string = malloc(sizeof(abc_string) * 4);
char* abc_string = malloc(sizeof(*abc_string) * 4);

Do not typecast the return value of malloc(…) in C code.

Some people who learned older C coding practices were taught to always typecast the result of malloc(…). For example, they would do the above call like this: char* abc_string = (char*)malloc(sizeof(*abc_string) * 4). That is not needed in C. malloc(…) returns a void* (generic memory address with no expectations about what type will be stored at that address). You are assigning that to a char*. That requires no typecast. The typecast only creates opportunities for bugs. Typecasts are not allowed in this class, except where you know a clear reason why the typecast is necessary and why it is safe.

char* abc_string = (char*)malloc(sizeof(*abc_string) * 4);
char* abc_string = malloc(sizeof(*abc_string) * 4);

Do not use any typecasts anywhere in any file for HW09.

If you think you need a typecase, you are probably making a mistake.

Typecasts are EVIL!!! They tell the compiler, “I know what I'm doing, so don't warn me if I seem to be making a mistake with types.” Sometimes they are a necessary evil, but you should use them only when they are truly necessary and safe, and you can articulate why.

char* abc_string = (char*)malloc(sizeof(*abc_string) * 4);
char* abc_string = malloc(sizeof(*abc_string) * 4);

Getting Started

Get the starter code

you@ecegrid-thin1 ~/264/ $ 264get hw09
This will write the following files: 1. hw09/join_strings.h Ok? (y/n)y 1 files were written
you@ecegrid-thin1 ~/264/ $ cd hw09
you@ecegrid-thin1 ~/264/hw09 $

Copy your log_macros.h, miniunit.h, and Makefile from previous assignments.

you@ecegrid-thin1 ~/264/hw09 $ cp ../hw06/log_macros.h ./
you@ecegrid-thin1 ~/264/hw09 $ cp ../hw07/miniunit.h ./
you@ecegrid-thin1 ~/264/hw09 $ cp ../hw08/Makefile ./
you@ecegrid-thin1 ~/264/hw09 $

Update your Makefile

you@ecegrid-thin1 ~/264/hw09 $ vim Makefile

You should only need to modify three lines.

# VARIABLES
ASG_NICKNAME = HW09
BASE_NAME = join_strings
SUBMIT_FILES = $(SRC_C) $(TEST_C) miniunit.h log_macros.h Makefile

Your SUBMIT_FILES variable might vary in how it uses the other variables. Just make sure that when all variables have been expanded, SUBMIT_FILES includes all of the files that must be submitted with this assignment.

Create a test file (test_join_strings.c) with one minimal test for copy_string(…)

This test will test if copy_string(…) can correctly copy an empty string (i.e., "").

Do not skip this step. A common mistake on homework assignments like this is to mess up the '\0'. If you have a very simple test that takes care of that—and get it working and tested before doing the rest of this assignment—you will make sure you can solve this issue in isolation before adding any further complexity.

you@ecegrid-thin1 ~/264/hw09 $ vim test_join_strings.c
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include "join_strings.h"
#include "miniunit.h"

static int _test_copy_string_empty() {
    mu_start();
    //────────────────────
    char const* s_orig = "";
    char* s_copy = copy_string(s_orig);
    mu_check_strings_equal(s_copy, s_orig);
    free(s_copy);
    //────────────────────
    mu_end();
}

int main(int argc, char* argv[]) {
    mu_run(_test_copy_string_empty);
    return EXIT_SUCCESS;
}
// Okay to copy/adapt code from this example.

In the above example, s_orig is declared as char const* s_orig. The const ensures that you cannot accidentally write to *s_orig (i.e., you cannot write to the individual characters in s_orig).

Implement just enough of copy_string(…) in join_strings.c to pass that test.

Hint: You will call malloc(…) and implement enough for a buffer to contain just one character (i.e., the '\0'.) Then, write the '\0' to the buffer.

Get this test passing before you add any more tests.

Submit.

Add a second test for copy_string(…).

This will test if copy_string(…) can copy a non-empty string (i.e., "abc").

// …

static int _test_copy_string_abc() {
    mu_start();
    //────────────────────
    char const* s_orig = "abc";
    char* s_copy = copy_string(s_orig);
    mu_check_strings_equal(s_copy, s_orig);
    free(s_copy);
    //────────────────────
    mu_end();
}

int main(int argc, char* argv[]) {
    // …
    mu_run(_test_copy_string_abc);
    return EXIT_SUCCESS;
}
// Okay to copy/adapt code from this example.

Implement just enough of copy_string(…) in join_strings.c to pass that test.

Hint: You can use strlen(…) to get the number of characters in string (not including the '\0').

Hint: When you call malloc(…), be sure to allocate enough memory for all of the characters in string, and the '\0'.

Hint: You can use strcpy(destination string, source string) to copy a string. For example, if s_copy is the new buffer on the heap that you will copy into, then strcpy(s_copy, string) will copy all of the characters in string and the '\0' to the buffer at s_copy.

strlen(…) and strcpy(…) are declared in the standard header file string.h. To you use them, you must include that header file in the same way you would include stdio.h or stdlib.h.

Submit.

Add a minimal test for wrap_string(…).

This will test if wrap_string(…) can correctly copy an empty string, including the specified left and right delimiter characters. (For examples, see the Requirements table below.)

// …

static int _test_wrap_string_empty() {
    mu_start();
    //────────────────────
    char* s_copy = wrap_string("", '[', ']');
    mu_check_strings_equal(s_copy, "[]");
    free(s_copy);
    //────────────────────
    mu_end();
}

int main(int argc, char* argv[]) {
    // …
    mu_run(_test_wrap_string_empty);
    return EXIT_SUCCESS;
}
// Okay to copy/adapt code from this example.

Implement just enough of wrap_string(…) in join_strings.c to pass that test.

Hint: For this stage, you should not need strlen(…), strcpy(…), memcpy(…), or any loops.

Submit.

Add a more substantial test for wrap_string(…).

This will test if wrap_string(…) can correctly copy a non-empty string , including the specified left and right delimiter characters.

// …

static int _test_wrap_string_abc() {
    mu_start();
    //────────────────────
    char* s_copy = wrap_string("abc", '[', ']');
    mu_check_strings_equal(s_copy, "[abc]");
    free(s_copy);
    //────────────────────
    mu_end();
}

int main(int argc, char* argv[]) {
    // …
    mu_run(_test_wrap_string_abc);
    return EXIT_SUCCESS;
}
// Okay to copy/adapt code from this example.

Implement just enough of wrap_string(…) in join_strings.c to pass that test.

Hint: You can use memcpy(destination, source, num_bytes) to copy a specified number of bytes (num_bytes) from one buffer (source) to another (destination). Unlike strcpy(…), memcpy(…) does not copy the '\0'. It copies a specified number of bytes. memcpy(…) is equivalent to the following code:

void* memcpy(void* destination, void const* source, size_t num_bytes) {
    for(size_t i = 0; i < num_bytes; i++) {
        destination[i] = source[i];  // copy one byte from source to destination
    }
    return destination; // return destination (for convenience)
}

Reminder: void* is a type that means address of anything, and you can assign an anything* to a void*. Since wrap_string(…) will be dealing with a buffer of char, you can imagine the signature of memcpy(…) was char* memcpy(char* destination, char const* source).

Reminder: size_t is an unsigned integer type that is guaranteed to be sufficient to store the number of bytes in any array on the current system. While it may be tempting to use int everywhere, since it is familiar, there could be arrays (including strings) with more than INT_MAX bytes.

Add a minimal test for join_strings(…).

This will test if join_strings(…) can correctly copy an array containing a single string. Since the delimiter only comes between successive strings in the array, the delimiter will be ignored for this test case. (For more examples, see the Requirements table below.)

// …

static int _test_join_strings_one() {
    mu_start();
    //────────────────────
    char const* const strings[] = { "abc" };
    char* combined_str = join_strings(strings, 1, "-");
    mu_check_strings_equal(combined_str, "abc");
    free(combined_str);
    //────────────────────
    mu_end();
}

int main(int argc, char* argv[]) {
    // …
    mu_run(_test_join_strings_one);
    return EXIT_SUCCESS;
}
// Okay to copy/adapt code from this example.

Note that strings is declared as char const* const strings[]. That means strings is an array of char*. Thus, strings[0] is the address of the 'a' in "abc". The const makes it work with join_strings(…), which has some protections to prevent you from accidentally modifying the array of strings that is passed in, or any character within any of those strings.

join_strings(…) is declared in join_strings.h as follows:

char* join_strings(char const* const* strings, size_t num_strings, char const* separator)

The first const ensures that you cannot accidentally modify **strings (first character in the first string)—or any other character in any of the strings. In other words, if you try to do strings[▒][▒] = '▒';, gcc will give you an error message to prevent you from making that mistake.

The second const ensures that you cannot accidentally modify *strings (the first string)—or any string in the array. In other words, if you had some other string elsewhere in memory (e.g., other_string), you cannot do strings[▒] = other_string;. Gcc would give you an error message to save you.

Do not use typecasts anywhere in HW09.

Implement just enough of join_strings(…) in join_strings.c to pass that test.

Hint: For this stage, join_strings(…) should do nothing more than copy_string(…).

Do not skip this step and do not try to implement the whole thing at this stage. Just get it working for a single string. If it contains any more code than your copy_string(…), you are not following directions and should expect problems. Get join_strings(…) working for just this simple case first. There are some complexities with the type of the argument that you will want to iron out before you add anything further.

Submit.

Add a more test for join_strings(…) with multiple strings but an empty delimiter.

// …

static int _test_join_strings_three_empty_delimiter() {
    mu_start();
    //────────────────────
    char const* const strings[] = { "abc", "def", "ghi" };
    char* combined_str = join_strings(strings, 3, "");
    mu_check_strings_equal(combined_str, "abcdefghi");
    free(combined_str);
    //────────────────────
    mu_end();
}

int main(int argc, char* argv[]) {
    // …
    mu_run(_test_join_strings_three_empty_delimiter);
    return EXIT_SUCCESS;
}
// Okay to copy/adapt code from this example.

Implement just enough of join_strings(…) in join_strings.c to pass that test.

Reminder: Do not use typecasts anywhere in HW09.

Submit.

Add another test for join_strings(…) with multiple strings and a non-empty delimiter.

// …

static int _test_join_strings_three_nonempty_delimiter() {
    mu_start();
    //────────────────────
    char const* const strings[] = { "abc", "def", "ghi" };
    char* combined_str = join_strings(strings, 3, "-");
    mu_check_strings_equal(combined_str, "abc-def-ghi");
    free(combined_str);
    //────────────────────
    mu_end();
}

int main(int argc, char* argv[]) {
    // …
    mu_run(_test_join_strings_three_nonempty_delimiter);
    return EXIT_SUCCESS;
}
// Okay to copy/adapt code from this example.

Implement just enough of join_strings(…) in join_strings.c to pass that test.

Submit.

Continue until finished

You are almost done. You still need to test join_strings(…) for an empty array (i.e., join_strings(NULL, 0, "-")). Also, make sure your tests are adequate.

Requirements

  1. Your submission must contain each of the following files, as specified:
    file contents
    join_strings.c functions
    copy string(char const✶ string)
    return type: char✶
    Make a copy of string on the heap.
    • Caller is responsible for freeing the heap memory for the copy.
    • Caller is responsible for freeing the heap memory buffer allocated by this function.
      • Do not call free(…) in this function.
    wrap string(char const✶ string, char left delimiter, char right delimiter)
    return type: char✶
    • Return a newly allocated string on the heap containing left_delimiter followed by every character in string followed by right_delimiter.
    • Examples:
      • wrap_string("flootix", '(', ')') should create a string on the heap containing (flootix).
      • wrap_string("moblish", '<', '>') should create a string on the heap containing <moblish>.
      • wrap_string("moblish", '$', '$') should create a string on the heap containing $moblish$.
      • wrap_string("", '[', ']') should create a string on the heap containing [].
    • Caller is responsible for freeing the heap memory buffer allocated by this function.
      • Do not call free(…) in this function.
    join strings(char const✶ const✶ strings, size t num strings, char const✶ separator)
    return type: char✶
    • strings is an array of strings.
    • num_strings is the number of strings in strings.
    • separator is a string, a copy of which will be copied between each of the strings in strings in the output.
    • If num_strings == 1 (i.e., strings contains just one string), then join_strings(…) should simply return a copy of strings[0].
      • This is equivalent to copy_string(strings[0]).
    • If num_strings == 0, then strings must be NULL and join_strings(…) should return a newly allocated empty string.
      • This is equivalent to copy_string("").
    • Examples:
      • This code…
        char* strings[] = { "abc", "def"}; char* joined_string = join_strings(strings, 2, "-");
        … should set joined_string to…
        abc-def
    • Caller is responsible for freeing the heap memory buffer allocated by this function.
      • Do not call free(…) in this function.
    test_join_strings.c functions
    main(int argc, char✶ argv[])
    return type: int
    Test the code in your join_strings.c using your miniunit.h.
    • 100% code coverage is required for join_strings.c.
     test ▒▒▒()
    return type: int
    • Use your mu_check_strings_equal(…) from HW07 to check that the code in your join_strings.c is working correctly.
     test ▒▒▒()
    return type: int
     test ▒▒▒()
    return type: int
    miniunit.h macros Same as HW07. Okay to modify, if you wish.
    log_macros.h macros Same as HW06. Okay to modify, if you wish.
    Makefile macros Same as HW08. Okay to modify, if you wish.
  2. Do not modify join_strings.h.
    • If your join_strings.h includes #define JOIN_STRINGS_H_VERSION=1, you may add two spaces to change that to #define JOIN_STRINGS_H_VERSION = 1,
  3. Do not use typecasts anywhere in HW09. The code quality standard says typecasts may not be used, except when they are truly necessary, safe, and you can articulate why. They are not necessary for anything in this assignment, so they should not be used.
  4. Only the following external header files, functions, and symbols are allowed in join_strings.c.
    header functions/symbols allowed in…
    stdbool.h bool, true, false *
    stdio.h * test_join_strings.c
    assert.h assert *
    string.h strlen, strcpy, memcpy *
    stdlib.h free, EXIT_SUCCESS test_join_strings.c
    stdlib.h malloc join_strings.c
    All others are prohibited unless approved by the instructor. Feel free to ask if there is something you would like to use.
  5. Submissions must meet the code quality standards and the policies on homework and academic integrity.

Submit

To submit HW09 from within your hw09 directory, type

make submit

That should result in the following command: 264submit HW09 join_strings.c test_join_strings.c miniunit.h log_macros.h Makefile

Pretester

Pre-tester

The pre-tester for HW09 has been released and is ready to use.

10/10/2022
  • Getting Started was augmented with additional steps, including explanation.
  • Prominent warnings not to use typecasts were added throughout. This was already covered by the code quality standard, but it is a particular hazard for HW09, so explicit warnings were added.