In this lab series you will design a level 1 cache hierarchy, consisting of a direct mapped instruction cache and a two way associative data cache. The caches provide fast access to addresses that exhibit the properties of ‘locality’. Each cache will have a data store (the cache) and a control component that operates the data store.

Design

lab9__1.png
Figure 1. Cache

The cache diagram shows the general operation of a cache. The address(addr) is used to index into the data array and if an item is found to match the tag a hit is registered. Should there not be an item in the data store that matches the tag the control component will have to retrieve the item from the ram. While the item is being retrieved the cache instructs the processor that it does not yet have the item it requested by not registering a hit. Once the processor sends the halt signal, the cache should flush all dirty values back to update memory.

Tip
The cache size refers to the amount of data stored. It does not include tag bits or state bits.

Please refer to the lecture slides for the cache structure and terminology. A brief set is provided here

  • Block : The lowest granularity of data to be operated on between Cache and Memory. A block may contain multiple words or a single word. Each time you update the cache from memory or write back to it, you have to do it for the whole block.

  • Frame : A frame contains a block, tag, a valid bit and possibly a dirty bit.

  • Set : Refers to the cache entries corresponding to a particular index. A set could have many frames. This depends on the set associativity of the cache.

Tip
Please read the cpu_types_pkg file and use the frame structure we have provided.

The frames on a set only share the same index. Everything else is independent of each other. This means that when you have to compare the tag from the input address to the tag fields in all frames corresponding to a set.

The value of the address is used to access the cache, the format can be seen in the following diagram.

lab9__2.png
Figure 2. Address fields

The lower two bits are the byte offset which let you access specific bytes from your word. Since the lowest granularity of data and instruction that our processor operates on is a word, the lower two bits shall always be 00 and can be ignored.

From right to left, the block offset lets you know which word from the cache block to select. The index selects the correct set of the cache. Finally, the tag compares against the set tag to register a hit or a miss in the cache.

You have to ensure that your pipeline does not expect any ordering of hits (instr vs data). With the addition of caches, ihit’s and dhit’s can register at the same time, and an ihit can register before a dhit.

Design Specification

The dcache and the icache will require state machines. The dhit and ihit should be asynchronous though. This means that hits in your design should not take more than a cycle.

You are required to use the following interfaces and packages:

Packages
  • CPU types: This contains data types for your processor design.

Interfaces
  • CPU Ram: Connects your cpu to ram.

  • Datapath Cache: Connects your datapath to the caches.

  • Cache Control: Connects your caches to memory_control.

  • System: Connects the system to the testbench and fpga wrapper.

The use of these packages and interfaces is required in your design. These can not be modified, or changed in any way by you the student.

Important
Only the course staff may make changes to the interfaces and provided types. Should changes be necessary, you will be instructed to pull from the git repository to merge these changes.

There are a few policies to control the operation of a cache, you will implement the following:

Policies
  • The write policy will be ‘write-back’.

  • The allocation policy will be ‘allocate on miss’

  • The replacement policy will be ‘least recently used’.

Cache Specifications
Instruction Cache
  • 512bits in size

  • Direct mapped.

  • One word per block.

Data Cache
  • 1Kbits in size.

  • Two way associative.

  • Two words per block.

  • Invalidate cache blocks on halt.

  • 32bit hit counter (address 0x3100) to validate against simulator. Your design will not pass without writing the hit counter.

    • Only initial hits should be counted, misses that turn into hits are considered misses.

    • Do not count the same hit multiple times.

    • This count should be written to address 0x3100.

Setup

For this design you will branch from your pipelined processor.

To do this issue the following commands:

git checkout pipeline

git checkout -b caches

Note
There is no cache branch to pull from on the course repo.

You should now have your processor files for use.

Files

The following files contain the package and interfaces that are required in this design.

  • packages: cpu_types_pkg.vh

  • interfaces: cpu_ram_if.vh, datapath_cache_if.vh, cache_control_if.vh, system_if.vh

You should also have the following component files. These files are templates to guide you in the design of your processor. You will need to fill in the caches file to ensure you integrate your icache and dcache designs.

Processor Components
  • caches.sv

  • memory_control.sv

Testing

Use sim -c to simulate the core with caches. This will generate the correct memsim.hex file to compare against. Use -t too for the trace.

For testasm, use testasm -c for source and testasm -c -s for mapped.

Deliverables

For the first installment, you must have the block diagram for both caches and the HDL implementation as well as a testbench with test cases documented. You can find the evaluation sheet here for lab 8.

Your team needs to decide who writes the HDL code for the dcache and icache respectively. If you are chosen to write the dcache HDL you have to write and design the icache testbench, and similaraly if you write the icache HDL you need to write and design the dcache testbench. The testbenches need to check at least all specifications defined in the Design Specifications Section.

Some (Not all) Testbench requirements:

  • Dcache

    • Test Associativity

    • Test Flushing

    • Test Read and writes to same tag different blocks

    • Test Capacity misses

    • Test Conflict misses

    • Test Compulsory misses

    • Test Writeback specification

    • Toggle Coverage on Dcache table

  • Icache

    • Test Capacity misses

    • Test Conflict misses

    • Test Compulsory misses

    • Toggle Coverage on Icache table

In addition to the Block diagrams, HDL code, and testbenches, you need to write 3 assembly files:

  • Program that checks associativity of your Dcache.

  • Program that allows dhits and ihits to be triggered in the same cycle. (Loop for icache, and same address for dcache)

  • Program that lets ihits to be triggered before dhit in the same cycle. (Loop for icache, and new address in dcache)

The second installment requires you to integrate the caches into your pipelined processor design. You can find the evaluation sheet here for lab 9.

The deliverables for the cache labs:

  • Block diagram of your caches.

    • Electronically generated with diagramming software.

    • All signals and detail present for your design.

  • HDL code for both instruction and data caches.

  • Testbench for both caches.

    • Document test cases in testbench.

    • Comprehensive test cases for design usage.

  • Completed evaluation sheets for the respective labs.

  • Electronic submission of your design.

OPTIONAL BONUS

If you want some bonus points in the lab. The bonus in this lab will be quite involved and will require changes to files we provided you.

To ensure validity of your electronic submission you need to create a new branch

git checkout -b caches_bonus

If you do the bonus on your main caches branch the electronic submission script will not like you when grading your design.

For the bonus you need to create cache-bypassing for the fpga. Essentially you need to use the switches as inputs and LED’s as outputs for a program that runs on your processor. You will need to specify a MMIO address range and new signals (io_addr, io_read, io_write, io_data) that will come out of your caches file and all the way through to the system_fpga file. You will need to edit the system_fpga file so that now your switches are read from and LED’s are written to. You need to transform the input in some way.

ABET Objective

Failure to satisify the ABET Objective for this lab (Lab Objective 4) via at least one of the following methods will result in failing this objective.

  • Completion of the appropriate lab 9 sign-offs (on-time)

  • Remediation of the appropriate lab 9 sign-offs by the end of week 12