Data Structures

Fall 2023 ECE 36800 :: Purdue University

Code quality standards

This course will teach you how to write clean code. Clean code makes bugs less frequent, and easier to detect when they do occur. It also makes it easier for others to help you with your code. In industry, writing clean code will ensure that anyone on the team and understand and maintain it. This increases the value of your code—and you—to the company. Follow these guidelines all the time—from the beginning.

dead code

Unused functions

never
Rule:  Functions that are never called from within a file and are not part of the public interface (i.e., required for homework) must be removed or commented out.
Reason:  Unused functions (including old versions) cause confusion. You may think you're debugging the one that gets called and the realize you're debugging something that never gets called at all.

Unreachable code

never
Rule:  Remove or comment out any statements that can never be reached.

Unused variables

never
Rule:  Remove or comment out variables that are never read (and thus have no effect on program execution).

Useless assignments

never
Rule:  Remove assignment statements that can have no effect on program execution.

variable scope

Loop index

limit scope to loop
Rule:  Declare and initialize for loop counter in the for statement.
Exception:  Occassionally, you may need to use the counter in code below the for loop. In that case, declare and initialize outside the for loop (like Bad #1 above) and explain why in a comment.

Global variables

constants only
Rule:  Do not use global variables, except for constants, unless there is a clear and compelling reason.
Note:  In ECE 26400, we will tell you if there is a clear and compelling reason. Otherwise, do not use global variables, except for constants (declared with const).

variable initialization

Declare where you initialize

usually
This section was revised to clarify on 2/6/2020.
Rule:  Declare local variables on the line where they are first assigned to, or as close as possible.
Exception:  In rare occassions, there may be no meaningful value to initialize a variable to. Don't initialize to a meaningless value just to comply with this rule, since that could prevent Valgrind and/or GCC from warning you if you try to access an uninitialized value.
Exception:  You cannot declare a variable inside a switch/case statement.
Note:  This is consistent with Google's C++ Style Guide under Local Variables.

Array initializers

use
Rule:  Use array initializers where possible and appropriate
Reason:  Array initializers are clearer to read, and avoid certain types of bugs (e.g., numbers[3] = 4 for an array of size 3.).
Note:  This is consistent with Google's C++ Style Guide under Local Variables.

Struct initializers

use
Rule:  Use named initializers where possible and appropriate.
Reason:  Struct initializers are clearer to read. They also prevent mistakes where programmers sometimes forget to initialize one field, or perhaps edit the code later and inadvertently delete part of the initialization. Bad #2 is actually fine in earlier versions of c (e.g. c89), but the new syntax shown as Good prevents errors where you get the order wrong.

Struct object on the heap

use
Rule:  Use compound initializers to initialize struct objects on the heap.
Reason:  Initializing an object is essentially a single action. Keeping it in a single statement reduces the chance that you forget one part of it, or that you move part of it but not the rest.

names – format conventions

Global constants

ALL_CAPS
Rule:  Use all caps with words separated by underscores
Reason:  This makes code more readable by making it clear which variables might change.

Multi-word variable and function names

Rule:  Pick one of the following and use it consistently for all of your code in a given assignment.
Note:  You do not have to match the style of any names required by us as part of the assignment.

naming – numbers and strings

1- and 2-letter names

only approved short names
Rule:  Do not use 1- and 2-character names, except as allowed below if it is clear.
Allowed:  i, j, k as counters for a 0-based for loop with a short body (≤10 lines).
Allowed:  x, y, z as coordinates in an image or physical space.
Allowed:  x1, x2, y1, y2, z1, z2 as bounds of a rectangle in an image or physical space.
Allowed:  w, h, d as dimensions of an image or physical object/space; width, height, and depth, respectively.
Allowed:  r, g, b, a as color components red, greed, blue, and alpha, respectively.
Allowed:  n as the primary operand to a generic calculation (e.g., sqrt(…) in small scopes (≤10 lines).
Allowed:  n as a number of something in small scopes (≤10 lines).
Allowed:  ch or c as a char in a small scope (≤10 lines) where there are no other variables of type char
Allowed:  s as a char* in a small scope (≤10 lines) where there are no other variables of type char*.
Allowed:  any short, meaningless variable name in a demonstration or explanation of programming language features in which the content of the variable is irrelevant
Reason:  The above exceptions are widely understood. Using very short names that are not understood forces the programmer/reader to search for the definition or else assume what they think it means.
Note:  Many in industry and open source, many would find these too loose (i.e., prefer less use of single-letter variables). For the class, we are erring on the permissive side to avoid being too draconian.
Note:  This is consistent with Google's C++ Style Guide under General Naming Rules.

loop index (int)

▒▒▒_idx, or ▒▒▒_num for 1-based
Rule:  Loop index variables should be named “▒▒▒_idx” for 0-based for loops, or “▒▒▒_num” for 1-based loops, where ▒▒▒ is a singular noun. In small scopes (≤10 lines), “i”, “j”, or “k” are also acceptable for 0-based for loops.
Examples:  person_idx, line_num

dimensions, distances, measurements (int, float, etc.)

Ex: size_bytes, height_cm, …
Rule:  Variables and struct members that represent a dimension, distance, other measurable quantity should be named with a singular noun or noun phrase their meaning and/or units (if applicable).
Examples:  file_size_bytes, height_cm, distance_mi, outside_temp_c
Exception:  Units may be omitted if the context leaves no room for ambiguity (e.g., x not x_px in image processing).

quantities (int)

Ex: num_digits, person_ct, …
Rule:  Variables and struct members that represent a quantity must be named “num_plural noun or noun phrase” or singular noun or noun phrase_ct
Examples:  num_digits, person_ct

sequence length (int)

Ex: num_names, list_size, …
Rule:  Variables and struct members that represent the number of items in an array, string, or list may be named according to the rules for quantities (above) −OR− “▒▒▒_size”, “▒▒▒_length”, or “▒▒▒_len” (where ▒▒▒ is the name of the array/string/list variable).
Examples:  num_recipients, node_ct, list_size, filenames_len

bool

Ex: is_empty, did_finish, should_delete, …
Rule:  Variables, functions, and struct members of type bool must be named “verb_▒▒▒”, where verb is in the third-person simple present tense.
Examples:  is_empty, has_name, did_finish, needs_data, should_delete, will_reboot, must_replace, can_reach_server

array

Ex: nodes, person_array, …
Rule:  Variables and struct members representing an array must be named with a plural noun or noun phrase, or noun or noun phrase_array.
Examples:  orders, list_nodes, people, person_array

string (char*)

Ex: name, date_str, …
Rule:  Variables and struct members of type char* that represent a string should be named with a singular noun that describes the contents or role of the string (e.g., person_name) or “▒▒▒_str”. In very small scopes (≤10 lines), s is allowed.
Examples:  name, time_str

address

Ex: a_new_value, a_divisor, name, node, …
Rule:  Variables and struct members that contain a memory address must begin with “a_” (or “a” if you are using lower-camelcase), with a few exceptions described below.
Exception:  Do not use the “a_” prefix for strings, list nodes, and other objects primarily referred to by their address. For those types, add the “a_” prefix only when passing by address (e.g., char**, Node**, etc.).
Note:  Outside of this class, you will see “p” or “ptr”. For pedagogical reasons, we use the word “address”.
Examples:  int* a_new_value = ▒;, Person* a_person = { ▒ };, char* name = ▒;, Node* new_node = ▒;

naming – functions

function

imperative verb phrase
Rule:  Function names must start with an imperative verb (e.g., “create_▒▒▒”, “calculate_▒▒▒”, “destroy_▒▒▒”, etc.), except main(…).

internal helper function

Rule:  Begin with an underscore and declare as static.
Reason:  The underscore warns readers that this code may be hard to understand. It also lets them know that it is safe to change the function without risk of breaking any code outside this file. The static keyword causes the compiler to make the function visible only to code in this file.
Note:  Here's why this matters. If you declare a function as static but do not call it anywhere in the current file, gcc will complain. Usually, this is a good thing, since it can alert you to the presence of dead code. For example, if you have a helper function (e.g., _print_as_currency(int num_cents)) and then make another version of it (e.g., _print_as_currency_v2(…)), in the course of debugging or changing, but don't comment out the old version, you end up with two functions that both appear to do the same thing. When you come back to the code later, you might waste time looking at the old one. If it is declared as static, then gcc will tell you that the old function is defined (as static) but never used. That reminds you to comment out (or remove) the old version (i.e., _print_as_currency(…)) so you or someone else doesn't waste time trying to figure out why its code doesn't match the program's behavior. (Before this rule was added, the instructor often wasted time in office hours trying to help students debug, but looking at an old version of the function that was misbehaving.)
Note:  "Internal" means things that should not be called from outside the current file or context. This rule is more important for larger projects, and is pointless for projects that have only a single file. However, real projects tend to grow, so it is a good habit to build on.
Exception:  In your test code (or any .c file containing a main(…) function, prefixing with underscore and/or declaring as static is optional.
Reason:  Omitting the static function declaration is allowed for test functions (e.g., test_simple()) because sometimes, you want to temporarily comment out other test functions (e.g., test_hard()) to focus on just one test (e.g., test_simple()).
Reason:  Omitting the '_' prefix is allowed for test functions because they are not like typical internal helper functions. The 'test_' prefix already tells the reader what they need to know.

internal global constants

Rule:  Begin with an underscore and declare as static const.
Reason:  This warns readers of your code (including you) that the meaning may be hard to understand. It also lets them know that it may change, so it's best not to refer to it from elsewhere.
Note:  "Internal" means things that should not be called from outside the current file or context. This rule is more important for larger projects, and is pointless for projects that have only a single file. However, real projects tend to grow, so it is a good habit to build on.

function returning bool

same as bool variables
Rule:  (same as bool variables)
Examples:  is_empty(…), has_name(…), did_finish(…), needs_data(…), should_delete(…), will_reboot(…), must_replace(…), can_reach_server(…)

naming – struct types, objects

Struct type

UpperCamelCase
Rule:  Use camel case with the first letter capitalized.
Reason:  This makes types look very different from instances of those types.
Note:  This is consistent with Google's C++ Style Guide under Type Names.

Struct object

singular noun or noun phrase
Rule:  Variables, functions, and struct members representing/returning a struct object must be named with a singular noun or noun phrase.

Linked list node type

▒▒▒Node
Rule:  Linked list node type should be called “Node” or “▒▒▒Node”.

Linked list node object

▒▒▒node, next, prev, curr, head, tail
Rule:  Variables and struct members representing a linked list node (e.g., Node) or address thereof (e.g., Node*) should be should be named “node”, “▒▒▒_node”, “next”, “prev”, “curr”, “head”, or “tail”. The head may also be called “▒▒▒_list”.

Tree node type

▒▒▒TreeNode, ▒▒▒BSTNode, …
Rule:  Tree node type should be called “TreeNode”, “▒▒▒TreeNode”, “BSTNode”, or “▒▒▒BSTNode” (as appropriate).

Tree node object

node, root, curr, head, leaf, right, left
Rule:  Variables and struct members representing/returning such a tree node (e.g., TreeNode) or address of a tree node (e.g., TreeNode*) should be should be named “node”, “▒▒▒_node”, “root”, “curr”, “head”, “right”, “left”, or “leaf”.

cast operator

casting operator

avoid
This section was revised to clarify on 2/6/2020.
Rule:  Do not use the casting operator in any context—including malloc(…)—unless absolutely necessary and safe for reasons that you completely understand well. Do not add a casting operator "just in case," or to silence warnings that you don't understand.
Update 9/13/2022: I thought the typecast was necessary here. Evidently, it is not.

structure

braces for if, for, while, switch, do, …

always required
Rule:  Always required. (Do not use inside each case of a switch, but use for the switch itself.)
Reason:  Even though C will let you omit the braces if the body is just one line, that is bug-prone. If you later add one more line to the body of your for loop, but don't add curly braces, your code may behave differently from how it appears.
Note:  Apple had a major security bug due to someone omitting the braces (article).

multiple statements on one line

no
Rule:  Not allowed
Reason:  It makes code unnecessarily hard to read.

Braces for if blocks

pick one… and be consistent
Rule:  Pick one of these styles and be consistent

Braces for functions

pick one… and be consistent
Rule:  Pick one of these styles and be consistent

For vs. While

Prefer for, unless while is clearer
Rule:  Use for loops for simple, non-destructive iteration over arrays and linked lists, unless a while loop would be clearer.
Exception:  If a while loop is clearer, use a while loop. This is not a blank ban on while loops.
Exception:  If the loop counter is useful after the loop ends, choose whichever you think is clearest.

whitespace

indent block

always
Rule:  Body of a function definition or if/for/while/do/switch statement must be indented, relative to the lines that contain the enclosing curly braces.
Exception:  In rare cases, you might want a loop or function with an empty body. In those cases, you may have both braces on the same line as the beginning of the loop. Also, note that this rule does not apply to struct or array initializers.

indent type

tab OR spaces… consistently
Rule:  You can use either tabs or spaces to indent your code, but you must be consistent.
Note:  Tabs are recommended because the starter files will usually use tabs. If you prefer spaces, edit your .vimrc to change set noexpandtab to set expandtab. In addition, you will need to call :retab! on any starter file to convert tabs to spaces. Your vim modeline should include 'et' instead of 'noet'. You are welcome to choose a different tab size (i.e., instead of 4), but the aforementioned caveats apply there, as well.
Reason:  If you mix tabs and spaces, your file layout may be hard to read for others with a different tabstop setting.

Spacing between functions

1 OR 2 blank lines… consistently
Rule:  Either one or two spaces between functions—and be consistent

explanatory comments

Comments

as needed for clarity
Rule:  Comment code that is complex, or where the purpose of a block of code would be unclear at first glance. Do not include comments for things that are extremely obvious.
Note:  This is consistent with Google's C++ Style Guide under Implementation Comments.

Vim modeline

always
Rule:  Required
Reason:  This ensures that when course staff in ECE 264 view your code, they will have the same indent type that you had. It's also a good habit when working on projects where others might be using Vim (outside of this class).
Good:  (for those who prefer tabs)
/* vim: set tabstop=4 shiftwidth=4 fileencoding=utf-8 noexpandtab: */
Good:  (for those who prefer spaces)
/* vim: set tabstop=4 shiftwidth=4 fileencoding=utf-8 expandtab: */

test code

printf(…)

only for required output
Rule:  Use printf(…) only for required output. For debugging, use #define to create a log macro that you can turn off with a flag to the compiler (i.e., conditional compilation).

assert(…)

use to catch bugs in your code
Rule:  Use assert(…) to catch bugs in your code. Do not use assert to catch anything that the user could control, external issues with the system, or anything else outside of your code.

constant expressions

constants

const not #define
Rule:  Declare with const (not #define), where possible.
Reason:  const enforces types. #define just does a simple text substitution.
Exception:  You may use #define for constants if you have a very specific reason that is documented in a comment.
Exception:  You cannot initialize a const variable using another const variable.
Note:  A const variable defined in a header file should be declared as static (e.g., static const char* APP_NAME = "▒▒▒▒▒";).
Note:  This is consistent with Google's C++ Style Guide under Use of const.

NULL vs. 0 vs. '\0'

0 only for quantity
Rule:  Use NULL for addresses (i.e., null pointers), 0 for numeric values, and '\0' for characters (e.g., string terminator).
Reason:  This makes code more readable and avoids bugs.
Note:  Internally, NULL, 0, and '\0' are all stored as 0.
Note:  This is consistent with Google's C++ Style Guide under 0 and nullptr/NULL.

Boolean values

bool/true/false not int/1/0
Rule:  Do not use 1 and 0 for true and false. Instead, use the true and false values and the bool type from the stdbool.h library. We are using the C99 version of the C language standard in this class, which includes stdbool.
Note:  We are using the C99 version of the C language standard in this class, which includes stdbool.
Reason:  The constants true and false make your code more readable because express what you mean. Since stdbool.h is part of the standard library, they will be readable by others and interoperable with other code.

sizeof(…)

use whenever possible
Rule:  Do not make assumptions about the size of any data type. Use the sizeof(…) operator instead.

sizeof argument

expression not type
Rule:  Argument must be a variable name when used in an initialization or malloc(..)
Reason:  This reduces duplication of information. If you change the type of the variable later, this will automatically adjust.
Note:  This is consistent with Google's C++ Style Guide under sizeof.

character literals

'A' not 65
Rule:  Use the character literal (e.g., 'A') and not the integer literal (e.g., 65) if the data represents a character.

Related resources

Psychology of code readability (5/15/2018) by Egon Elbre

The Art of Giving and Receiving Code Reviews (Gracefully) (6/25/2018)

Other Code Quality Standards

Updates

2/6/2020: Clarified use of casting operators for truncating doubles to ints, and non-use when fetching string parameter using va_arg(…). Renamed “Declarations initialize” to “Declare where you initialize.” Clarified exceptions to “Declare where you initialize.”

Clarified that the a_ prefix should be used for a char** (address to a char* referring to a string) or Node** (address to a Node* referring to a list node). Added: ❝For those types, add the “a_” prefix only when passing by address (e.g., char**, Node**, etc.).❞