Advanced C Programming

Spring 2024 ECE 26400 :: Purdue University

This is a PAST SEMESTER (Spring 2024).
Due 2/23

malloc(…): Join strings

Learning goals

You should learn how to:

  1. Use malloc(…) to allocate buffers on the heap.
  2. Write bug-resistant code that uses malloc(…).
  3. Debug common problems that occur when using dynamic memory.

Overview

In this assignment, you will create a few functions for manipulating strings. While these functions are operations that are genuinely useful in some contexts, the real purpose of this assignment is to give you some practice using dynamic memory, starting with something simple (copying a string to the heap) and moving up to something a little fancier (joining several strings with a separator string).

Dynamic memory (aka “heap memory”)

As we discussed in class, memory is organized in segments.

The stack segment is where local variables and arguments are stored. One limitation of the stack is that you have to know in advance how big any array, string, or object will be. If your code will take input from a file, a user, the network, or external code that you didn't write, then you can't predict how big things need to be. Another limitation is that when a function returns, all local variables and arguments are invalidated (i.e., become unavailable).

The text segment is where your compiled code is loaded in memory when you run your program.

The data segment is where most string literals are stored (e.g., for code like char* s = "xyz";). Those are stored in a read-only section of the data segment. (There is also a writable section, where most global and static variables are stored, but we don't talk much about that since you won't use any global or static variables in this class.)

The heap segment lets you specify exactly how many bytes you need. You use the malloc(…) function to allocate (reserve) that space. You allocate a buffer (a bunch of bytes) into which you can store a string, array, or any data you like. Your buffer says allocated until you explicitly deallocate it by calling the free(…) function.

Example: "abc"

Let's start with an example. This code just prints abc.

More specifically, it allocates (i.e. reserves space for) a buffer on the heap sufficient for the letters abc and the null terminator, initializes the string with those letters, prints the string, and then deallocates (“frees”) the buffer so that space in memory can be used for something else.

#include <stdio.h>
#include <stdlib.h>
#include "log_macros.h"

char* get_abc_string() {
  // Allocate a buffer on the heap for the new string, and store its address in abc_string.
  char* abc_string = malloc(sizeof(*abc_string) * (3 + 1)); // +1 is for the '\0'

  // malloc(…) takes the number of bytes you want to allocate, allocates (reserves) that
  // space on the heap segment, and returns the address of the first byte.

  // Initialize the characters in the newly allocated buffer.
  abc_string[0] = 'a';
  abc_string[1] = 'b';
  abc_string[2] = 'c';
  abc_string[3] = '\0'; // null terminator (DON'T FORGET!!!)

  return abc_string;
  // Because our string is on the heap (not the stack), it will remain accessible by
  // the caller for as long as it is needed.
}

int main(int argc, char* argv[]) {
  char* abc_string = get_abc_string();

  // Print the string
  log_str(abc_string);  // output: abc_string == "abc"

  // Free the buffer (DON'T FORGET!!!)
  free(abc_string);

  // free(…) takes the address of a buffer that was previously allocated using malloc(…), and
  // deallocates (“frees”) it, making it available to be used for other purposes (i.e., other
  // parts of the system).

  return EXIT_SUCCESS;
}

In the sections below, we will break down exactly how each part of this code works.

How to allocate a buffer on the heap using malloc(…)

malloc(…) is a function that takes the number of bytes you want to allocate, allocates (reserves) that much space on the heap for you, and returns the address of the newly allocated space (often called a “buffer” or “block”).

For a very simple example, malloc(100) would allocate 100 bytes for you and return the address. Although your C compiler would allow that, no sane programmer would call malloc(…) in that way. Over time, programmers learned new methods of coding defensively to avoid bugs. Following modern best practices, we require that you use a very specific form when calling malloc(…)

To create an array of TYPE elements on the heap (e.g., array of ints or longs), use the following syntax:

TYPE* NAME = malloc(sizeof(*NAME) * # of elements);

The variable NAME will be initialized to the address of the first element in the array stored in your newly allocated buffer on the heap. You must then initialize each element of the array (e.g., NAME[0] = …;, etc.).

Since a string is just an array of char elements, we use malloc(…) to allocate a buffer for a string using this syntax:

char* NAME = malloc(sizeof(*NAME) * # of characters (including '\0'));

Example:
char* abc_string = malloc(sizeof(*abc_string) * (3 + 1)); // +1 is for the null terminator ('')

The variable abc_string will be initialized to the address of the first character of the string in the newly allocated buffer on the heap. You must then initialize each character (e.g., abc_string[0] = 'a';, etc.). Don't forget the null terminator (ex: abc_string[3] = '\0';).

☛ Use descriptive variable names. NAME is just a placeholder.

Every buffer allocated by malloc(…) must be deallocated (i.e., “freed”) once (and only once) by calling free(…) with the address to the buffer. Keep reading to learn how.

How to free buffers on the heap using free(…)

To free any buffer that was allocated using malloc(…), call free(…) as follows:

free(NAME);

Memory leaks

If you forget to deallocate (free(…)) a buffer that you allocated with malloc(…), you will have a memory leak. In this class, you will receive a penalty for any memory leaks. In the real world, memory leaks make programs run slowly and sometimes cause applications or systems to crash.) You can detect memory leaks using Valgrind (described below), but it is much easier to avoid them in the first place.

☛ For every call to malloc(…), you should have a corresponding call to free(…).

Getting Started

Get the starter code

you@eceprog ~/264/ $ 264get hw08
This will write the following files: 1. hw08/join_strings.h Ok? (y/n)y 1 files were written
you@eceprog ~/264/ $ cd hw08
you@eceprog ~/264/hw08 $

Copy your log_macros.h, miniunit.h, and Makefile from previous assignments.

you@eceprog ~/264/hw08 $ cp ../hw05/log_macros.h ./
you@eceprog ~/264/hw08 $ cp ../hw06/miniunit.h ./
you@eceprog ~/264/hw08 $ cp ../hw07/Makefile ./
you@eceprog ~/264/hw08 $

Update your Makefile

you@eceprog ~/264/hw08 $ vim Makefile

You should only need to modify three lines.

# VARIABLES
ASG_NICKNAME = HW08
BASE_NAME = join_strings
SUBMIT_FILES = $(SRC_C) $(TEST_C) miniunit.h log_macros.h Makefile

Your SUBMIT_FILES variable might vary in how it uses the other variables. Just make sure that when all variables have been expanded, SUBMIT_FILES includes all of the files that must be submitted with this assignment.

Create a test file (test_join_strings.c) with one minimal test for copy_string(…)

This test will test if copy_string(…) can correctly copy an empty string (i.e., "").

Do not skip this step. A common mistake on homework assignments like this is to mess up the '\0'. If you have a very simple test that takes care of that—and get it working and tested before doing the rest of this assignment—you will make sure you can solve this issue in isolation before adding any further complexity.

you@eceprog ~/264/hw08 $ vim test_join_strings.c
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include "join_strings.h"
#include "miniunit.h"

int _test_copy_string_empty() {
    mu_start();
    //────────────────────
    char const* s_orig = "";
    char* s_copy = copy_string(s_orig);
    mu_check_strings_equal(s_copy, s_orig);
    free(s_copy);
    //────────────────────
    mu_end();
}

int main(int argc, char* argv[]) {
    mu_run(_test_copy_string_empty);
    return EXIT_SUCCESS;
}
// Okay to copy/adapt code from this example.

In the above example, s_orig is declared as char const* s_orig. The const ensures that you cannot accidentally write to *s_orig (i.e., you cannot write to the individual characters in s_orig).

Implement just enough of copy_string(…) in join_strings.c to pass that test.

Hint: You will call malloc(…) and implement enough for a buffer to contain just one character (i.e., the '\0'.) Then, write the '\0' to the buffer.

Get this test passing before you add any more tests.

Submit

Add a second test for copy_string(…).

This will test if copy_string(…) can copy a non-empty string (i.e., "abc").

// …

int _test_copy_string_abc() {
    mu_start();
    //────────────────────
    char const* s_orig = "abc";
    char* s_copy = copy_string(s_orig);
    mu_check_strings_equal(s_copy, s_orig);
    free(s_copy);
    //────────────────────
    mu_end();
}

int main(int argc, char* argv[]) {
    // …
    mu_run(_test_copy_string_abc);
    return EXIT_SUCCESS;
}
// Okay to copy/adapt code from this example.

Implement just enough of copy_string(…) in join_strings.c to pass that test.

Hint: You can use strlen(…) to get the number of characters in string (not including the '\0').

Hint: When you call malloc(…), be sure to allocate enough memory for all of the characters in string, and the '\0'.

Hint: You can use strcpy(destination string, source string) to copy a string. For example, if s_copy is the new buffer on the heap that you will copy into, then strcpy(s_copy, string) will copy all of the characters in string and the '\0' to the buffer at s_copy.

strlen(…) and strcpy(…) are declared in the standard header file string.h. To you use them, you must include that header file in the same way you would include stdio.h or stdlib.h.

Submit

Add a minimal test for wrap_string(…).

This will test if wrap_string(…) can correctly copy an empty string, including the specified left and right delimiter characters. (For examples, see the Requirements table below.)

// …

int _test_wrap_string_empty() {
    mu_start();
    //────────────────────
    char* s_copy = wrap_string("", '[', ']');
    mu_check_strings_equal(s_copy, "[]");
    free(s_copy);
    //────────────────────
    mu_end();
}

int main(int argc, char* argv[]) {
    // …
    mu_run(_test_wrap_string_empty);
    return EXIT_SUCCESS;
}
// Okay to copy/adapt code from this example.

Implement just enough of wrap_string(…) in join_strings.c to pass that test.

Hint: For this stage, you should not need strlen(…), strcpy(…), memcpy(…), or any loops.

Submit

Add a more substantial test for wrap_string(…).

This will test if wrap_string(…) can correctly copy a non-empty string , including the specified left and right delimiter characters.

// …

int _test_wrap_string_abc() {
    mu_start();
    //────────────────────
    char* s_copy = wrap_string("abc", '[', ']');
    mu_check_strings_equal(s_copy, "[abc]");
    free(s_copy);
    //────────────────────
    mu_end();
}

int main(int argc, char* argv[]) {
    // …
    mu_run(_test_wrap_string_abc);
    return EXIT_SUCCESS;
}
// Okay to copy/adapt code from this example.

Implement just enough of wrap_string(…) in join_strings.c to pass that test.

Hint: You can use memcpy(destination, source, num_bytes) to copy a specified number of bytes (num_bytes) from one buffer (source) to another (destination). Unlike strcpy(…), memcpy(…) does not copy the '\0'. It copies a specified number of bytes. memcpy(…) is equivalent to the following code:

void* memcpy(void* destination, void const* source, size_t num_bytes) {
    for(size_t i = 0; i < num_bytes; i++) {
        destination[i] = source[i];  // copy one byte from source to destination
    }
    return destination; // return destination (for convenience)
}

Reminder: void* is a type that means address of anything, and you can assign an anything* to a void*. Since wrap_string(…) will be dealing with a buffer of char, you can imagine the signature of memcpy(…) was char* memcpy(char* destination, char const* source).

Reminder: size_t is an unsigned integer type that is guaranteed to be sufficient to store the number of bytes in any array on the current system. While it may be tempting to use int everywhere, since it is familiar, there could be arrays (including strings) with more than INT_MAX bytes.

Add a minimal test for join_strings(…).

This will test if join_strings(…) can correctly copy an array containing a single string. Since the delimiter only comes between successive strings in the array, the delimiter will be ignored for this test case. (For more examples, see the Requirements table below.)

// …

int _test_join_strings_one() {
    mu_start();
    //────────────────────
    char const* const strings[] = { "abc" };
    char* combined_str = join_strings(strings, 1, "-");
    mu_check_strings_equal(combined_str, "abc");
    free(combined_str);
    //────────────────────
    mu_end();
}

int main(int argc, char* argv[]) {
    // …
    mu_run(_test_join_strings_one);
    return EXIT_SUCCESS;
}
// Okay to copy/adapt code from this example.

Note that strings is declared as char const* const strings[]. That means strings is an array of char*. Thus, strings[0] is the address of the 'a' in "abc". The const makes it work with join_strings(…), which has some protections to prevent you from accidentally modifying the array of strings that is passed in, or any character within any of those strings.

join_strings(…) is declared in join_strings.h as follows:

char* join_strings(char const* const* strings, size_t num_strings, char const* separator)

The first const ensures that you cannot accidentally modify **strings (first character in the first string)—or any other character in any of the strings. In other words, if you try to do strings[▒][▒] = '▒';, gcc will give you an error message to prevent you from making that mistake.

The second const ensures that you cannot accidentally modify *strings (the first string)—or any string in the array. In other words, if you had some other string elsewhere in memory (e.g., other_string), you cannot do strings[▒] = other_string;. Gcc would give you an error message to save you.

Do not use typecasts anywhere in HW08.

Implement just enough of join_strings(…) in join_strings.c to pass that test.

Hint: For this stage, join_strings(…) should do nothing more than copy_string(…).

Do not skip this step and do not try to implement the whole thing at this stage. Just get it working for a single string. If it contains any more code than your copy_string(…), you are not following directions and should expect problems. Get join_strings(…) working for just this simple case first. There are some complexities with the type of the argument that you will want to iron out before you add anything further.

Submit

Add a more test for join_strings(…) with multiple strings but an empty delimiter.

// …

int _test_join_strings_three_empty_delimiter() {
    mu_start();
    //────────────────────
    char const* const strings[] = { "abc", "def", "ghi" };
    char* combined_str = join_strings(strings, 3, "");
    mu_check_strings_equal(combined_str, "abcdefghi");
    free(combined_str);
    //────────────────────
    mu_end();
}

int main(int argc, char* argv[]) {
    // …
    mu_run(_test_join_strings_three_empty_delimiter);
    return EXIT_SUCCESS;
}
// Okay to copy/adapt code from this example.

Implement just enough of join_strings(…) in join_strings.c to pass that test.

Reminder: Do not use typecasts anywhere in HW08.

Submit

Add another test for join_strings(…) with multiple strings and a non-empty delimiter.

// …

int _test_join_strings_three_nonempty_delimiter() {
    mu_start();
    //────────────────────
    char const* const strings[] = { "abc", "def", "ghi" };
    char* combined_str = join_strings(strings, 3, "-");
    mu_check_strings_equal(combined_str, "abc-def-ghi");
    free(combined_str);
    //────────────────────
    mu_end();
}

int main(int argc, char* argv[]) {
    // …
    mu_run(_test_join_strings_three_nonempty_delimiter);
    return EXIT_SUCCESS;
}
// Okay to copy/adapt code from this example.

Implement just enough of join_strings(…) in join_strings.c to pass that test.

Submit

Continue until finished

You are almost done. You still need to test join_strings(…) for an empty array (i.e., join_strings(NULL, 0, "-")). Also, make sure your tests are adequate.

Requirements

  1. Your submission must contain each of the following files, as specified:
    file contents
    join_strings.c functions
    copy string(char const✶ string)
    return type: char✶
    Make a copy of string on the heap.
    • Caller is responsible for freeing the heap memory for the copy.
    • Caller is responsible for freeing the heap memory buffer allocated by this function.
      • Do not call free(…) in this function.
    wrap string(char const✶ string, char left delimiter, char right delimiter)
    return type: char✶
    • Return a newly allocated string on the heap containing left_delimiter followed by every character in string followed by right_delimiter.
    • Examples:
      • wrap_string("flootix", '(', ')') should create a string on the heap containing (flootix).
      • wrap_string("moblish", '<', '>') should create a string on the heap containing <moblish>.
      • wrap_string("moblish", '$', '$') should create a string on the heap containing $moblish$.
      • wrap_string("", '[', ']') should create a string on the heap containing [].
    • Caller is responsible for freeing the heap memory buffer allocated by this function.
      • Do not call free(…) in this function.
    join strings(char const✶ const✶ strings, size t num strings, char const✶ separator)
    return type: char✶
    • strings is an array of strings.
    • num_strings is the number of strings in strings.
    • separator is a string, a copy of which will be copied between each of the strings in strings in the output.
    • If num_strings == 1 (i.e., strings contains just one string), then join_strings(…) should simply return a copy of strings[0].
      • This is equivalent to copy_string(strings[0]).
    • If num_strings == 0, then strings must be NULL and join_strings(…) should return a newly allocated empty string.
      • This is equivalent to copy_string("").
    • Examples:
      • This code…
        char* strings[] = { "abc", "def"}; char* joined_string = join_strings(strings, 2, "-");
        … should set joined_string to…
        abc-def
    • Caller is responsible for freeing the heap memory buffer allocated by this function.
      • Do not call free(…) in this function.
    test_join_strings.c functions
    main(int argc, char✶ argv[])
    return type: int
    Test the code in your join_strings.c using your miniunit.h.
    • 100% code coverage is required for join_strings.c.
     test ▒▒▒()
    return type: int
    • Use your mu_check_strings_equal(…) from HW06 to check that the code in your join_strings.c is working correctly.
     test ▒▒▒()
    return type: int
     test ▒▒▒()
    return type: int
    miniunit.h macros Same as HW06. Okay to modify, if you wish.
    log_macros.h macros Same as HW05. Okay to modify, if you wish.
    Makefile macros Same as HW07. Okay to modify, if you wish.
  2. Do not modify join_strings.h.
  3. Do not use typecasts anywhere in HW08. The code quality standard says typecasts may not be used, except when they are truly necessary, safe, and you can articulate why. They are not necessary for anything in HW08.
  4. Only the following external header files, functions, and symbols are allowed in join_strings.c.
    header functions/symbols allowed in…
    stdbool.h bool, true, false *
    stdio.h * test_join_strings.c
    assert.h assert *
    string.h strlen, strcpy, memcpy *
    stdlib.h free, EXIT_SUCCESS test_join_strings.c
    stdlib.h malloc join_strings.c
    All others are prohibited unless approved by the instructor. Feel free to ask if there is something you would like to use.
  5. Submissions must meet the code quality standards and the policies on homework and academic integrity.

Submit

To submit HW08 from within your hw08 directory, type

make submit

That should result in the following command: 264submit HW08 join_strings.c test_join_strings.c miniunit.h log_macros.h Makefile

Pre-tester

The pre-tester for HW08 has been released and is ready to use.

Q&A

  1. Do we need to check if malloc(…) returned NULL?

    No. That was not taught in this class (this section). It is commonly taught in other C courses, but we don't teach it here because (1) the course does not cover good methods of handling run-time errors by this point in the semester, and (2) is not aware of any reasonable way for you to test that functionality.
  2. How do I set up my Makefile?

    Here is an example. 👌 Okay to copy/adapt for Spring 2024 (Quinn) only. Makefile

Updates

There are no updates to HW08 so far.