### Fall 2022 ECE 264 :: Purdue University :: Section 3 (Quinn)

⚠ This is a PAST SEMESTER (Fall 2022).
Due 11/29

# Huffman 5: write bits to a file

## Learning goals

You should learn:

1. Practice using linked lists and trees.
2. Combining linked lists with trees
3. Learning objectives: structures and dynamic structures

## Overview

This is part 5 in an 8-part series in which you are implementing the Huffman Coding algorithm to compress files.

1. huffman_1: Implement priority queue for future use
2. HW16: Calculate character frequencies from a file
3. HW17: Create a priority queue of trees from the character frequencies in a file.
4. HW18: Build a single Huffman tree.
5. HW19: Implement functions to write arbitrary bits to a file, for future use. ◀◀◀ this homework
6. HW20: Write the Huffman Tree to the disk.
7. HW21: Write the encoded file contents to the disk.
8. huffman_8: Create a command-line interface for compressing files.

In this homework, you will create a reusable module for writing sequences of bits to binary files. In HW20, you will use this to write your Huffman tree to a file. Then, in HW21, you will use it to write the encoded data to a file.

It is important that you understand all six bitwise operators. They are in your reference sheet. If you did not understand after watching the relevant lecture(s), the warmup will guide your thinking. If you feel you understand well enough already, feel free to opt out. (We know time is short.)

### Example

Let's understand how this works through an example.

The following code writes three bytes to a file, but does so in smaller chunks of a few bits at a time. When `write_bits(…)` is called, a byte may or may not be written directly to a file. We hold some bits in memory and keep them until either ⓐ we have a full byte to write, or ⓑ the file is being closed and we need to flush the remaining bytes.

```int main(int argc, char* argv[]) {
BitWriter writer = open_bit_writer("new_file.bits");
write_bits(&writer, 0x05, 3);  // 0x05 ↔ 00000101₂ ⋯ writes just 101₂
write_bits(&writer, 0xf3, 3);  // 0xf3 ↔ 11110011₂ ⋯ writes just 011₂
write_bits(&writer, 0x01, 2);  // 0x01 ↔ 00000001₂ ⋯ writes just 01₂
write_bits(&writer, 0x20, 6);  // 0x20 ↔ 00100000₂ ⋯ writes just 100000₂
write_bits(&writer, 0x13, 5);  // 0x13 ↔ 00010011₂ ⋯ writes just 10011₂
close_bit_writer(&writer);
return EXIT_SUCCESS;
}
```
1. First, we create the `BitWriter` object by calling `open_bit_writer(…)`.
```BitWriter writer = open_bit_writer("new_file.bits");
```
2. We “write” 3 bits: `101₂`.
```write_bits(&writer, 0x05, 3);  // 0x05 ↔ 00000101₂ ⋯ writes just 101₂
```
This doesn't actually write anything to the file. Those bits are written to the `.current_byte` field, so that when additional bits are “written”, they can be combined before that byte is written to the file.
3. We “write” 3 more bits: `011₂`.
```write_bits(&writer, 0xf3, 3);  // 0xf3 ↔ 11110011₂ ⋯ writes just 011₂
```
Notice that although we passed `11110011₂` (0xf3) to `write_bits(…)`, only the least significant (rightmost) 3 bits had any effect. The most significant (leftmost) 5 bits were not zero, but they were nonetheless ignored.
4. We “write” 2 more bits: `01₂`.
```write_bits(&writer, 0x01, 2);  // 0x01 ↔ 00000001₂ ⋯ writes just 01₂
```
Initially, those two bits are added to `.current_byte` and we subtract 2 from `.num_bits_left` leaving the state like this: Before `write_bits(…)` returns, it checks if `.current_byte` is full (i.e., `num_bits_left==0`). Since it is indeed full, that byte is written to the file, and `.current_byte` and `.num_bits_left` are reset.

Finally, we have actually “written” something to the file.

5. We write 6 bits: `100000`
```write_bits(&writer, 0x20, 6);  // 0x20 ↔ 00100000₂ ⋯ writes just 100000₂
```
6. We write 5 bits: `10011`
```write_bits(&writer, 0x13, 5);  // 0x13 ↔ 00010011₂ ⋯ writes just 10011₂
```
There isn't enough room in `.current_byte` for all five bytes so they will have to straddle the current byte and the next byte. Since `.num_bits_left` is 2, we write the first two bits (i.e., `10₂`) to `.current_byte`. It is now full: Since it is full, we write `.current_byte` to the file and reset `.current_byte` to 0 and `.num_bits_left` to 8, just like before. We write the remaining 3 bits of 0x13 (`00010011₂` to `.current_byte`. We set `.num_bits_left` to 5 (=8−3).
7. We are finished and ready to close the file.
```close_bit_writer(&writer);
```
This will “flush” the remaining bits in `.current_byte` to the file. Any unused bits are filled with zeros (0). The `close_bit_writer(…)` function resets `.current_byte` to 0, `.num_bits_left` to 8, and sets `.file` to `NULL`, as a signal that this object is closed. That also prevents the caller from trying to use it.

## Getting Started

This homework will not use or interact with any of the files from the earlier Huffman homeworks.

1. Run `264get HW19` to fetch the starter code.

We have provided a skeleton file (bit_writer.c) and a header file (bit_writer.h) to save you time.

You will also find a warmup.c and two other files (print_bits.o and print_bits.h). See the Q&A for more details.

2. Change into your newly created HW19 directory.
```you@ecegrid-thin1 ~ \$ ````cd hw19`
3. Copy in your Makefile, log_macros.h, and miniunit.h from HW18 (or any other homework).
```you@ecegrid-thin1 ~/hw19 \$ ````cp ../hw18/{Makefile,log_macros.h,miniunit.h} -t ./`
```you@ecegrid-thin1 ~/hw19 \$ ````vim Makefile`
• Set `ASG_NICKNAME` to `HW19`.
• Set `SRC_C` to `bit_writer.c` and `TEST_C` to `test_bit_writer.c`.
• Set `SRC_H` to `bit_writer.h miniunit.h log_macros.h`.
• Ensure that your `submit` rule includes warmup.c. Altogether, it should have the following: `\$(SRC_C) \$(SRC_H) \$(TEST_C) Makefile warmup.c`
5. Edit bit_writer.c.
```you@ecegrid-thin1 ~/hw19 \$ ````vim bit_writer.c`
Add assertions (`assert(…)`) at the top/bottom of `write_bits(…)` to verify and document that:
1. On entry (top), `num_bits_to_write` ∈ [0, 8] and `.num_bits_left` ∈ [1, 8].
2. On exit (bottom), `.num_bits_left` ∈ [1, 8].

Those conditions should always be true. Otherwise, you most certainly have a bug somewhere.

If any of these conditions is not met, your program will exit and give you an error message telling you exactly where the problem is. That's a lot better than spending hours trying to trace why the output wasn't correct!

An inappropriate value for `num_bits_to_write` could be due to a bug in your test code. Normally, we don't use `assert(…)` to check inputs, but in this case, the inputs are coming from your code—and this will help you avoid a lot of headaches that might happen without it.

These assertions are included in screenshot in the Q&A. Feel free to copy/adapt from that screenshot—but only if you understand.

## Warm-up exercises

This assignment includes a warm-up exercise to help you get ready. This accounts for 10% of your score for HW19. Scoring will be relatively light, but the usual base requirements apply.

In the starter files, you will find a file called warmup.c. All details for the warmup are in that file.

### Opt out.

In a hurry, and don't need the practice? This warm-up is here to help you learn what you need to succeed on the rest of this assignment—not to add additional work. Therefore, we give you an option. Those who feel that they do not need the practice may "opt out". by modifying warmup.c so that it does nothing but print the following message exactly and then exit:

`I already know this and do not need to practice.`

If you do that, then your score for HW19 will be based solely on the rest of this assignment. If you leave the warmup.c undone, if you do not turn in a warmup.c, or if the message it prints does not match perfectly, then you will receive 0 for the warmup portion of this assignment (10%).

## Requirements

1. Your submission must contain each of the following files, as specified:
file contents
bit_writer.c functions
`open bit writer(const char✶ path)`
return type: BitWriter
Create a `BitWriter` object.
• Open a file for writing at `path` and assign it to the `.file` field.
• Initialize the `.current_byte` field to 0.
• Initialize the `.num_bits_left` field to 8.
• Use a compound literal.
• The body of this function should be only 1 sloc.
• Seriously… this should be nothing more than a single return statement. Don't make it hard.
• Do not call `malloc(…)`.
• Return the newly created `BitWriter` object.
`write bits(BitWriter✶ a writer, uint8 t bits, uint8 t num bits to write)`
return type: void
“Write” the least significant (right) `num_bits_to_write` bits from `bits`.
• `a_writer` is the address of a `BitWriter` object, the `.file` field of which is a file already opened for writing.
• `.num_bits_left` must be ∈ [1, 8] when entering and exiting `write_bits(…)`.
• `num_bits_to_write` must be ∈ [0, 8].
• If `.current_byte` has enough room (i.e., `num_bits_to_write` is ≤ `.num_bits_left`), then write them to `.current_byte`, starting with the MSB (left) side.
• If there is not enough room in `.current_byte`, then write as many as there is room for. Then, write the `.current_byte` to the file, reset `.current_byte` to zero. Finally, write the remaining bits from `bits` to `.current_byte`, starting with the MSB (left) side.
• Ignore all but the `num_bits_to_write` least significant bits in `bits`. In other words, you should not assume the rest of the byte (`bits`) will be all zeros.
• Update `.num_bits_left` with the number of unused bits remaining in `.current_byte`.
• Do not use any loops, arrays, switch statements, `malloc(…)`, or addresses (except `a_writer`).
• If you understand the bitwise operators, then you shouldn't need or want any of that. If not, make sure you have watched the relevant lecture (attentively) and/or do the warm-up. Prof. Quinn's lecture from 4/17/2020 (publicly accessible via the Schedule page) covered a very similar example (but writing to a buffer in memory instead of to a file).
`flush bit writer(BitWriter✶ a writer)`
return type: void
Write the current byte (`a_writer -> current_byte`) to the file.
• If there are no bits waiting to be written to the file (i.e., if `.num_bits_left == 8`) then do not write anything to the file.
• Reset the `.current_byte` field to 0.
• Reset the `.num_bits_left` field to 8.
• Any unused bits should be filled with 0's on the least significant (right) side.
• This is the same thing that happens when you call `close_bit_writer(…)`, except that the file (`.file`) is not closed.
`close bit writer(BitWriter✶ a writer)`
return type: void
Close the given `BitWriter` and reset its fields.
• Call `flush_bit_writer(…)` to write the current byte to the file.
• Close the file.
• Set `.file` to `NULL`.
• This is illustated in the example (part 7) .
test_bit_writer.c functions
`main(int argc, char✶ argv[])`
return type: int
• 100% code coverage is required for bit_writer.c.
bit_writer.h Do not modify.
warmup.c Functions are specified in the comments, in warmup.c.
miniunit.h
as usual
log_macros.h
as usual
Makefile
as usual
2. When the final byte is written, any unused bits should be filled with 0's on the least significant (right) side.
3. Do not call any file operations anywhere within your bit_writer.c, except as specified below. These restrictions are for your protection.
4. The following external header files, functions, and symbols are allowed.
stdbool.h `bool`, `true`, `false` `*`
stdio.h `printf`, `fprintf`, `fputs`, `fputc`, `stdout`, `fflush` `bit_writer.c`, `test_bit_writer.c`, `warmup.c`
stdio.h `fopen`, `fclose`, `fputc` `bit_writer.c`
stdio.h `fopen`, `fclose`, `fputc`, `fseek`, `ftell`, `fgetc`, `fread`, `fwrite` `test_bit_writer.c`
assert.h `assert` `*`
stdlib.h `EXIT_SUCCESS`, `size_t` `bit_writer.c`, `test_bit_writer.c`
stdint.h `uint8_t` `bit_writer.c`, `test_bit_writer.c`
string.h `*` `warmup.c`, `test_bit_writer.c`
miniunit.h `*` `test_bit_writer.c`
log_macros.h `*` `bit_writer.c`, `test_bit_writer.c`, `warmup.c`
“*” means anywhere (i.e., any file) or anything (i.e., any function, constant, type, etc.). All others are prohibited unless approved by the instructor. Feel free to ask.
• `fopen(…)` may be called once in `open_bit_writer(…)`, and nowhere else.
• `fclose(…)` may be called once in `close_bit_writer(…)`, and nowhere else.
• `fputc(…)` may be called once in `write_bits(…)` or `flush_bit_writer(…)`—but not both—and nowhere else.
These constraints are here to keep you on the right track.
5. Do not use any loops anywhere in bit_writer.c.
6. Submissions must meet the code quality standards and other course policies on homework.

## Submit

To submit HW19 from within your hw19 directory, type

`make submit`

That should result in the following command: `264submit HW19 bit_writer.c bit_writer.h test_bit_writer.c warmup.c miniunit.h log_macros.h Makefile`

## Pre-tester ●

The pre-tester for HW19 has been released and is ready to use.

# Q&A

1. ### How much work is this?

The instructor's solution comprises 30 sloc. However, 10 lines of that is given to you in the form of the skeleton file. Thus, you need to write ≥20 sloc.

If you joined or viewed the relevant lecture(s), and followed them then you can probably skip the warm-up. In that case, I would estimate this to be ≈2 hours of work. However, if you skipped them, skimmed, or didn't pay proper attention, then you will almost certainly need to go back and watch them, and/or do the warm-up. Searching for posts on Stackoverflow isn't going to help you much. (That said, if you do find any great resources, please share them on Piazza!)

For those who are still rusty, the warm-up walks you through each of the 6 bitwise operators: `^ << >> & | ~`. (They are also summarized on the reference sheet, but you need to internalize how they are used.

The warmup will require writing exactly 14 sloc. There are 11 functions that must be implemented in exactly one statement. One more function must be implemented by adding exactly three statements.

In light of the short timeframe, we are also giving you a few “extras”, to keep you on the right track so you don't burn too much time on this.

• The warm-up was not originally planned. You can opt out if you don't need it.
• The skeleton code and header file help you avoid some typing, and also avoid snafus related to mistyping.
• We are providing an annotated solution screenshot below, to give you an outline of one particular solution. See the next Q&A item.
2. ### I think I understand bitwise, but how do I start?

First, do the Getting Started above. Don't skip the part about the assertions.

Next, you may want to read through the annotated screenshot below. You don't have to follow this, but you may if you wish.

Keep in mind: There are a number of alternative approaches to some small parts of this that would be equally good. However, this may help you think about the problem.

👌 You may copy/adapt anything in this screenshot. (That basically means the assertions.)

3. ### How am I supposed to convert all of these binary bits to hexadecimal in my head?

It's not as hard as it looks at first. Each 4 bits is a “nibble”, and is represented by one hex digit. There are only 16 hex digits.
decimal binary hex
0 0000₂ 0
1 0001₂ 1
2 0010₂ 2
3 0011₂ 3
4 0100₂ 4
5 0101₂ 5
6 0110₂ 6
7 0111₂ 7
8 1000₂ 8
9 1001₂ 9
10 1010₂ a
11 1011₂ b
12 1100₂ c
13 1101₂ d
14 1110₂ e

So if you see `00111001₂`, break it into two nibbles: `0011₂` and `1001₂`. Even if you don't remember the table above, it's not hard to interpret a 4-digit binary number in your head. Just remember the places are 1, 2, 4, and 8 (right-to-left). For the first nibble, you have 1 + 2 = 3. In hex, that is 0x3 (easy!). For the second nibble, you have 1 + 8. In hex that is 0x9 (also easy!). Putting those two nibbles together, you get 0x39.

If you get `1111₂`, that's a good one to remember; it's `f`.

From there, `1110₂` is just one less, so it's `e`.

That leaves `1010₂` (`a`), `1011₂` (`b`), `1100₂` (`c`), and `1101₂` (`d`). For those ones, you might need to count the letters on your fingers or something, but with a little practice, this will quickly become second nature.

4. ### Is there a way to write individual bits to a file?

No. All of the functions for writing to files deal with whole bytes at a time.
5. ### What are the files print_bits.c and print_bits.h?

They contain some functions for printing bits to the terminal that help you see if your warmup is correct.
6. ### Why does print_bits.o end with .o?

It is an object file. (This was covered in HW07) and the related lecture, but you can be forgiven if you forgot. An object file contains compiled functions, but is not executable on its own. When compiling another program, you can link in the object file to gain access to its functions. This is much simpler than it sounds. When you call gcc, simply add print_bits.o to the command. For example, when you compile warmup.c, you use this command:
`gcc -o warmup warmup.c print_bits.o`
7. ### Why didn't you just give us the .c file?

One of the functions in print_bits.c/print_bits.o is essentially the same as the one you need to write at the end of the warmup. We didn't want to give it away. However, having that function makes the earlier ones easier since you can see what you're doing.
8. ### May I use the `print_bits_color(…)` and/or `print_bits_plain(…)` functions in my test_bit_writer.c?

Yes. Simply add print_bits.o to the end of each of your `gcc` commands in your Makefile. Treat it as if it were a .c file.
9. ### How can I see what's inside an object (.o) file?

The `objdump` command can be useful. Google it.
10. ### That sounds complicated. How can I see what functions are available and how to use them?

See the header file (print_bits.h). Header files often (but not always) serve as a sort of documentation.
11. ### Why bother setting `.current_byte` to 0 and `.num_bits_left` to 8 in `close_bit_writer(…)`?

It just seemed like a good way to tidy up the object. This way, when debugging or testing, it will be crystal clear which one is closed.
12. ### What do I fill in for any bits that haven't been written?

Fill in 0's on the LSB (right) side.
13. ### How can I check the contents of a file in my tests?

You may wish to create a helper function. You may copy/adapt the declaration below, if you like.
```  static bool _check_file_contents(char const* path, size_t num_bytes, uint8_t const* bytes) {
▒▒▒
}

static int _test_check_▒▒▒() {
mu_start();
//────────────────────

▒▒▒
_check_file_contents(path, 5, (uint8_t[]) { 0x5d, 0xc3, 0x00, 0x19 });
▒▒▒

//────────────────────
mu_end();
}
```
You will need to add more details to the starter above. Use only if you understand.