Step 7: Liveness Analysis and Register Allocation - Due Date: Monday Dec. 2nd, 11:59 pm (Free extension until Wednesday, Dec. 4th)

In this step, we will implement liveness analysis and apply it to register allocation algorithm to avoid unnecessary spilling of variables if the variable is dead.

You may find this set of hints from a couple of years ago useful.

Liveness Analysis

Liveness analysis is a data flow analysis that finds what variables are live after any statement. It is an any-path, backward flow analysis. We will perform intra-procedural liveness analysis (i.e., we will compute liveness at the function granularity, but not across functions) at the IR Node level (i.e., we will compute liveness based on IR instructions, not Tiny instructions).

Control flow graphs

The first step in computing liveness is to build a control flow graph for each function in your program. To represent your control flow graph, each IR Node should know its successors (IR instructions that could possibly execute immediately after it) and predecessors (IR instructions that could possible execute immediately before it). Conditional jumps have two successors: the explicit target of the jump, and the implicit (fall-through) target of the jump. Unconditional jumps only have one successor. Function calls should be treated as straight-line IR nodes (i.e., they are not treated as branches; their successor is the instruction immediately after the call). Return nodes do not have any successors.

Liveness across basic blocks

For each IR node in a function, you should define two sets: GEN and KILL. GEN represents all the temporaries and variables that are used in an instruction, and KILL represents all the temporaries and variables that are defined in an instruction. For most instructions, this should be pretty straightforward. A few tricky cases:

PUSH instructions use the variable/temporary being pushed
POP instructions define the variable/temporary being popped
WRITE instructions use their variables.
READ instructions define their variables.
CALL instructions require special care. Because we do not analyze liveness across functions, we must make conservative assumptions about what happens function calls. In particular, we GEN any variables that may be used, and KILL any variables that must be used. The GEN set for any CALL instruction therefore contains all global variables, while the KILL set is empty.

Once you know the GEN and KILL sets for each IR node, you can compute liveness. To do this, define IN (live-in) and OUT (live-out) sets for each IR Node. Initialize the OUT sets for RETURN IR nodes to all global variables (because global variables may be used after the function returns), and initialize all other sets to empty. Then use the worklist algorithm we discussed in class, initialized with the OUT sets for RETURN nodes, to compute the live-in and live-out sets for every IR node in the function. Remember that Liveness runs backwards: an instruction's OUT set is defined in terms of the IN sets of its successors, and its IN set is defined in terms of its OUT set. If an instruction's IN set changes, all of the instruction's predecessors need to be added to the worklist.

Register Allocation Algorithm

Use the bottom-up register allocation algorithm discussed in class. For each statement, you must ensure that the source operands are in registers, and that there is a register for the destination operand. Use the liveness information you computed (i.e., the live-out set for the instruction) to determine when it is safe to free registers, and when a dirty register needs to be stored back to memory (only when the variable in the register is live).

Bottom-up register allocation works at the basic-block level: any register allocation decisions you make apply for the current basic block only. This means that when you get to the end of a basic block, you must reset your register allocation. Any register that (a) hold local/global variables and (b) are dirty should be written back to the stack/global variable.

Note also that because a CALL instruction jumps into another method, any global variables that are in registers when the CALL is performed should be freed immediately prior to the CALL instruction, ensuring that the correct value for the global is in memory. This is different from saving the registers on the stack prior to a function call. The latter is done so that the caller method doesn't get its registers overwritten; the values of the registers are stored where only the caller can see them. The former is done so that the callee method sees the right values for global variables; the values need to be stored back to globals so that everyone can see them, and freed from the registers so that the caller will reload them after the callee returns.

Testcases

We will test your compiler against all of the test cases from steps 4, 5, and 6. The functional behavior of the programs will remain the same, but your generated Tiny should only use 4 registers. You can find some examples of 4-register outputs here. (The outputs correspond to inputs of the same name from previous steps).

Tiny Simulator

Please use the version of the tiny simulator supporting only 4 registers [C++ source code] for this step.