Due 11/12

Merge sort

In this assignment, you will implement the merge sort algorithm on singly‑linked lists. In addition, you will practice the test-driven development method of software development and turn in every stage of your TDD progression as different filenames.

Goals

The goals of this assignment are as follows:

Practice writing code using linked lists (a dynamic structure)
Practice writing code using recursion
Practice using the test-driven development method of software development

About Merge Sort

// Credit: Wikipedia user 'Swfung8', License: CC BY-SA

Recursion is an elegant method for solving many problems. At first it can seem unnatural; however, it is actually very simple once you are used to it. The key thing to understand is that recursive programming has two steps:

A base case which is trivial.
A recursive case where you write the solution for the larger case in terms of the solution to smaller cases.

What makes (2) easy is that you simply assume that you have the solution of a smaller case. Life does not generally let you assume you have a solution that you have not yet found; however, if you think about it, it should be obvious (by induction) that recursion works this way.

In this assignment, we will use (1) and (2) above to sort linked-lists of strings using the "merge-sort" algorithm. Merge-sort is a beautiful algorithm that sorts extremely quickly with large inputs. It is also the best possible algorithm to use for sorting linked-lists.

Merge sort works as follows:

Base case:
You never need to sort a list of length 0 or 1; it is already "sorted", so-to-speak. To implement the base case, simply return the input list if its length is 0 or 1. (In general, base cases are trivial like this.)
Recursive case:
Lets wave our hand, and assume that we know how to sort smaller lists already. (See, recursion is easy.) For the recursive case, we do the following:
1. Divide the list into two smaller lists of equal size, +/- one node.
2. Recursively sort each of the two smaller lists. (Wow: easy.)
3. You now have two small lists that are sorted. To produce the final large sorted list, you "merge" the two smaller lists together.
  1. Create a brand new empty list, which will be the result-list.
  2. While both small lists are non-empty, look at the head node of each. Take the smaller head node off of the front of its list, and append it to the result-list.
  3. Eventually one (or perhaps both) of the smaller lists will be empty. At this stage, append the non-empty list (if there is one) onto the end of the result-list.

// Credit: Aaron Michaux, Prof. Yung-Hsiang Lu … This description adapted from a previous ECE 264 assignment

Test-driven development

The test-driven development (TDD) method of software development has been an explicit part of many assignments in the course so far. TDD is a powerful tool for finishing programming assignments in a shorters-and more predictable-amount of time. Until now, it was strongly recommended, but not explicitly required. In this assignment, you will practice TDD and demonstrate your progress by submitting several stages of your development process as separate files.

Stage 0 will consist of llms00.c, test_llms00.c, and test_llms00.txt, and will represent an empty implementation. The test output file (test_llms00.txt) will be empty and the main function in test_llms00.c will have nothing but a return statement. Your llms00.c will be empty or nearly empty.

Stage 1 will consist of llms01.c, test_llms01.c, and test_llms01.txt, and will add the smallest piece of functionality possible. Before writing any code in your llms01.c, start by writing a small test case in your test_llms01.c and the expected output in test_llms01.txt. Then-and only then-will you add the necessary code to your llms01.c to make that test case work.

After every step of this process, you must test your code completely to make sure the output of running the test code matches the expected output and ensure that there are no memory faults. The method for testing was covered in the hw02 instructions. Use the diff command.

In each successive stage, you will add a very small amount of new functionality. Your tests in earlier stages must fit the overall requirements, and must continue to pass even in the later stages. For more information and/or examples of development progressions, see the instructions of hw02, hw03, hw06, and hw07.

In general, stage n will consist of llmsn.c, test_llmsn.c, and test_llmsn where n is zero-padded to two digits (e.g., 00, 01, …, 09, 10, 11, …). After completing and testing a stage, simply copy those three files to the new names with the new number. For example, after completing and testing stage 1, you would type the following commands to start stage 2:

cp llms01.c llms02.c
cp test_llms01.c test_llms02.c
cp test_llms01.txt test_llms02.txt

To demonstrate that you are following this method, you will adhere to a few constraints:

At least 12 stages (you will probably want more).
Test file for stage n+1 includes all tests from stage n.
Tests for all stages from 0 to n pass with the implementation for stage n.
Implementation code in stage n+1 is different from code in stage n.
Implementation code in stage n+1 adds required functionality that was not in stage n.
Implementation code in stage n+2 has more sloc* than implementation code for stage n.
* sloc = “source lines of code” (excluding comments and blank lines)

While these requirements may initially sound complex, they are not. If you are following the spirit of the TDD method, all of this will almost certainly happen naturally.

Definitions: For purposes of these instructions, “implementation code” means your llms??.c; “test code” means your test_llms??.c; “test output” means your test_llms??.txt; and “test” means a small section of your test_llms??.c (most likely in the main(…)) and the corresponding snippet in test_llms??.txt, which together can be used to verify that some aspect of your program works according to the specification.

Files

Here are the files you will create for this assignment.

file	contents
llms.h	type definitions	List struct type (declared using typedef) with =3 field(s): `head` (`Node✶`), `tail` (`Node✶`), and `size` (`int`, the number of nodes currently in the list); whenever `size==0`, `head` and `tail` must be `NULL`.
	type definitions	Node struct type (declared using typedef) with =2 field(s): `value` (`int`) and `next` (`Node✶`)
	function declarations	one for each require function in your llms*.c; do not include helpers (if any) here
llms00.c … llms11.c …	function definition	`create＿list(int✶Xarray,XintXsize)` → return type: List create a new `List`; if size==0 then return a list with the head and tail set to NULL; when size==0 array must be NULL and vice versa; otherwise you may assume that `array` contains `size` elements
		`free＿list(List✶Xlist)` → return type: void free all nodes in the list; set the `head` and `tail` to NULL
		`insert(List✶Xlist,XintXposition,XintXvalue)` → return type: void insert a new value in the list at the specified position; position is 0-based; a negative position indicates the position relative to the tail; call with position=0 to insert at the beginning of the list, with position=1 to insert just after the head, with position=-1 to insert just after the tail, position=-2 to insert just before the tail, and so on; if position ≥ `list->size` insert just after the tail; if position < `-list->size` insert just before the head
		`pop(List✶Xlist,XintXposition)` → return type: int find the node at the specified position; remove the node and return the value; semantics of position are just like with `insert(…)`; position is 0-based; a negative position indicates the position relative to the tail; call with position=0 to pop the node at the head, with position=1 to pop the node just after the head, with position=-1 to pop the tail, position=-2 to pop the node just before the tail, and so on; if position ≥ `list->size` pop the tail; if position ≤ `0-list->size` pop the head
		`sort(List✶Xlist)` → return type: void Sort the list using the merge sort algorithm without any additional heap allocation (i.e., calls to `malloc(…)`).
		any helper functions you wish to use (if any); names must be prefixed with "_"
test_llms00.c … test_llms11.c …	function definitions	`main(intXargc,Xchar✶Xargv[])` → return type: int test your functions in llms.c; as usual… must be comprehensive and original; must have a return statement
test_llms00.txt … test_llms11.txt …	text	expected output from running your respective tests; compiling and running test_llms#.c with your llms.c should yield the text found in test_llms#.txt

As usual… Type declarations manually. Do not copy-paste. (You learn more from typing code manually.)

Bonus #1: Reverse list recursively (1 point)

Add the following function, which will reverse the list recursively and without any additional heap allocation (i.e., calls to malloc(…)).

voidXreverse(List✶Xlist)

To indicate that you are attempting bonus #1, you must include the following in your llms.h:

#define HW10_BONUS_1

This is a classic coding interview problem. It is not hard to find solutions on the web, but please do the problem without peeking at any resources. Obviously, we have no way of verifying this, but you will get more benefit if you follow it. A reasonable solution can be done in about 10 sloc.

(Okay, actually there are ways to catch people googling for homework solutions, but I'm sure you value your learning more than 1 bonus point.)

More?

Additional bonus opportunities might be added later if there is enough interest.

Requirements

Adhere to the constraints given in the Test-Driven Development section above, namely:
1. At least 12 stages (you will probably want more).
2. Test file for stage n+1 includes all tests from stage n.
3. Tests for all stages from 0 to n pass with the implementation for stage n.
4. Implementation code in stage n+1 is different from code in stage n.
5. Implementation code in stage n+1 adds required functionality that was not in stage n.
6. Implementation code in stage n+2 has more sloc* than implementation code for stage n.
  * sloc = “source lines of code” (excluding comments and blank lines)
Test cases in test_llms??.c must be labelled with comments like // Stage 00, // Stage 01, …, // Stage 11, etc.
Do not add any additional fields to the required struct types (Node and List).
No heap memory may be allocated directly or indirectly by sort(…), pop(…), or free_list(…).

Only the following external header files, functions, and symbols are allowed. (All others are prohibited.)

header	functions/symbols	allowed in…
assert.h	`assert(…)`	any file
stdbool.h	`true`, `false`	any file
stdio.h	`printf(…)`,	only in your test file
stdlib.h	`malloc(…)`, `free(…)`	not in test or header files
stdlib.h	`NULL`	any file

Feel free to suggest additional header files or functions that you would like to use.

As usual…
1. Your submission must contain all of the files, functions, and type definitions as specified in the table above.
2. Only the files in the most recent submission will be scored.
3. Do not use dynamic arrays (C99 feature, e.g., char s[n];)
4. Code must successfully compile and run with your test_llms??.c and any other valid test files.
5. Code must meet all requirements in the Course Policies and Code Quality Standards.
  Note: If you choose to use assert(…), be sure you understand the specific guidelines that relate to it.
6. All code must be your own.
7. Your test code must be comprehensive (and original).

How much work is this?

This is intended to be a similar amount of effort to hw08, assuming you understood hw07. (The insert(…) and pop(…) functions are similar to ones you wrote for those assignments. You probably don't want to try to copy code directly, but having done it once will make it a lot easier to do again.)

Many students find that understanding merge sort is more challenging than implementing it.

As usual… Programming with dynamic memory usually entails some time debugging. Allow time for that.

Note: You will be turning in at least 37 files (1 x llms.h + ≥12 x (llms??.c, test_llms??.c, test_llms??.txt). You will probably want more than 12 stages, so this could end up being a lot more than 37 files. To submit all files in the current directory, simply type:
264submit hw10 *

Q&A

Can I reuse code from hw07?
Yes, but you probably don't want to. This is sufficiently different that you will most likely be better off writing insert(…) and pop(…) from scratch. However, the experience gained in hw07 is expected to be helpful in this assignment.
Do my TDD stages have to be the same as the ones I wrote for hw09?
No. While that would be a natural way to do it, it's fine if you change your mind.
In TDD, arent you supposed to check that the test fails before implementing the new functionality?
Yes. For simplicity, we skipped that part, but it is a good idea because it ensures that your test won't accidentally pass. In other words, it's a rough way of testing your test.
Isn't refactoring an important part of TDD?
Yes. For simplicity, we skipped that part, too.
Can you help me understand the position parameter to insert(…) and pop(…)?

Updates

11/6: In create_list, if size==0, then return a List with the head and tail set to NULL; added “when size==0 array must be NULL and vice versa; otherwise you may assume that array contains size elements”; “if position < list -> size insert just before the head; added Q5

11/12: You can use NULL in any file

Advanced C Programming

Autumn 2015 :: ECE 264 :: Purdue University