malloc(…): Join strings
Learning goals
You should learn how to:
- Use
malloc(…)
to allocate buffers on the heap. - Write bug-resistant code that uses
malloc(…)
. - Debug common problems that occur when using dynamic memory.
Overview
In this assignment, you will create a few functions for manipulating strings. While these functions are operations that are genuinely useful in some contexts, the real purpose of this assignment is to give you some practice using dynamic memory, starting with something simple (copying a string to the heap) and moving up to something a little fancier (joining several strings with a separator string).
Dynamic memory (aka “heap memory”)
As we discussed in class, memory is organized in segments.
The stack segment is where local variables and arguments are stored. One limitation of the stack is that you have to know in advance how big any array, string, or object will be. If your code will take input from a file, a user, the network, or external code that you didn't write, then you can't predict how big things need to be. Another limitation is that when a function returns, all local variables and arguments are invalidated (i.e., become unavailable).
The text segment is where your compiled code is loaded in memory when you run your program.
The data segment is where most string literals are stored (e.g., for
code like char* s = "xyz";
). Those are stored in a read-only section of the
data segment. (There is also a writable section, where most global and static variables are
stored, but we don't talk much about that since you won't use any global or static variables
in this class.)
The heap segment lets you
specify exactly how many bytes you need. You use
the malloc(…)
function to allocate (reserve)
that space. You allocate a buffer (a bunch of bytes) into which you can
store a string, array, or any data you like. Your buffer says allocated until you explicitly
deallocate it by calling the free(…)
function.
Example: "abc"
Let's start with an example. This code just prints abc
.
More specifically, it allocates (i.e. reserves space for) a buffer on the heap sufficient for
the letters abc
and the null terminator, initializes the string with those letters,
prints the string, and then deallocates (“frees”) the buffer so that space in memory can be used
for something else.
#include <stdio.h>
#include <stdlib.h>
#include "log_macros.h"
char* get_abc_string() {
// Allocate a buffer on the heap for the new string, and store its address in abc_string.
char* abc_string = malloc(sizeof(*abc_string) * (3 + 1)); // +1 is for the '\0'
// malloc(…) takes the number of bytes you want to allocate, allocates (reserves) that
// space on the heap segment, and returns the address of the first byte.
// Initialize the characters in the newly allocated buffer.
abc_string[0] = 'a';
abc_string[1] = 'b';
abc_string[2] = 'c';
abc_string[3] = '\0'; // null terminator (DON'T FORGET!!!)
return abc_string;
// Because our string is on the heap (not the stack), it will remain accessible by
// the caller for as long as it is needed.
}
int main(int argc, char* argv[]) {
char* abc_string = get_abc_string();
// Print the string
log_str(abc_string); // output: abc_string == "abc"
// Free the buffer (DON'T FORGET!!!)
free(abc_string);
// free(…) takes the address of a buffer that was previously allocated using malloc(…), and
// deallocates (“frees”) it, making it available to be used for other purposes (i.e., other
// parts of the system).
return EXIT_SUCCESS;
}
In the sections below, we will break down exactly how each part of this code works.
How to allocate a buffer on the heap using malloc(…)
malloc(…)
is a function that takes the number of
bytes you want to allocate, allocates (reserves) that much space on the heap for you, and returns
the address of the newly allocated space (often called a “buffer” or “block”).
For a very simple example,
malloc(100)
would allocate 100 bytes for you and return the address. Although
your C compiler would allow that, no sane programmer would call
malloc(…)
in that way.
Over time, programmers learned new methods of coding defensively to avoid bugs.
Following modern best practices, we require that you use a very specific form when calling
malloc(…)
To create an array of TYPE elements on the heap
(e.g., array of int
s or long
s), use the following syntax:
TYPE* NAME = malloc(sizeof(*NAME) * # of elements);
The variable NAME will be initialized to the address of
the first element in the array stored in your newly allocated buffer on the heap.
You must then initialize each element of the array
(e.g., NAME[0] = …;
, etc.).
Since a string is just an array of char
elements, we use
malloc(…)
to allocate a buffer for a string using this syntax:
char* NAME = malloc(sizeof(*NAME) * # of characters (including '\0'));
char* abc_string = malloc(sizeof(*abc_string) * (3 + 1)); // +1 is for the null terminator ('