Due 11/24

Image files

Goals

The goals of this assignment are as follows:

Learn how to program with binary files
Practice using structures

// Credit: Prof. Yung-Hsiang Lu and Prof. Cheng-Kok Koh created earlier assignments upon which this one is based. Text from that assignment has been copied with permission.

Overview

In this exercise, you will write code to read, write, and crop BMP image files.

HW12 will require your working code from HW11. Writing perfect, clean code on this assignment and testing it very well will make your life a lot easier on that assignment, which will be somewhat more challenging than this one.

Requirements might be amended slightly up to 11/19/2015. Changes (if any) will be announced and noted at the bottom of this page.

The BMP file format

A BMP file has the following format:

Header	54 bytes
Palette (optional)	0 bytes (for 24-bit RGB images)
Image Data	file size - 54 (for 24-bit RGB images)

The header has 54 bytes, which are divided into the following fields. Note that the #pragma directive ensures that the header structure is really 54-byte long by using 1-byte alignment.

typedef struct {             // Total: 54 bytes
  uint16_t  type;             // Magic identifier: 0x4d42
  uint32_t  size;             // File size in bytes
  uint16_t  reserved1;        // Not used
  uint16_t  reserved2;        // Not used
  uint32_t  offset;           // Offset to image data in bytes from beginning of file (54 bytes)
  uint32_t  dib_header_size;  // DIB Header size in bytes (40 bytes)
  int32_t   width_px;         // Width of the image
  int32_t   height_px;        // Height of image
  uint16_t  num_planes;       // Number of color planes
  uint16_t  bits_per_pixel;   // Bits per pixel
  uint32_t  compression;      // Compression type
  uint32_t  image_size_bytes; // Image size in bytes
  int32_t   x_resolution_ppm; // Pixels per meter
  int32_t   y_resolution_ppm; // Pixels per meter
  uint32_t  num_colors;       // Number of colors  
  uint32_t  important_colors; // Important colors 
} BMPHeader;

Note that the number of bytes each field occupies can be obtained by dividing the number 16 or 32 by 8. For example, the field "type" occupies 2 bytes. These fields are all integers. An "uint" means unsigned, and "int" means signed. For example the fields "width" and "height" are signed integers. However, for simplicity, all the BMP files we have will contain only positive integers. You may assume that in your code. Also, we are dealing wih uncompressed BMP format (compression field is 0).

Because of the packing specified in the bmp.h file, you should be able to use fread to read in the first 54 bytes of a BMP file and store 54 bytes in a BMPHeader structure.

Among all these fields in the BMPHeader structure, you have to pay attention to the following fields:

bits	number of bits per pixel
width	number of pixel per row
height	number of rows
size	file size
imagesize	the size of image data (file size - size of header, which is 54)

We will further explain bits, width, height, and imagesize later. You should use the following structure to store a BMP file, the header for the first 54 bytes of a given BMP file, and data should point to a location that is big enough (of imagesize) to store the image data (color information of each pixel).

typedef struct {
    BMPHeader header;
    unsigned char* data; 
} BMPImage;

Effectively, the BMPImage structure stores the entire BMP file.

Now, let's examine the fields bits, width, height, and imagesize in greater details. The bits field records the number of bits used to represent a pixel. For this exercise (and the next exercise and assignment), we are dealing with BMP files with only 24 bits per pixel or 16 bits per pixel. For 24-bit representation, 8 bits (1 byte) for RED, 8 bits for GREEN, and 8 bits for BLUE. For 16-bit representation, each color is represented using 5 bits (the most significant bit is not used). For this exercise, we will use only 24-bit BMP files to test your functions. However, your code should be able to handle 16-bit format as well. (Note that the header format is actually more complicated for 16-bit format. However, for this exercise and the next exercise and assignment, we will use the same header format for both 24-bit and 16-bit BMP files for simplicity. So yes, we are abusing the format!)

The width field gives you the number of pixels per row. Therefore, the total number of bytes required to represent a row of pixel for a 24-bit representation is width * 3. However, the BMP format requires each row to be padded at the end such that each row is represented by multiples of 4 bytes of data. For example, if there is only one pixel in each row, we need an additional byte to pad a row. If there are two pixels per row, 2 additional bytes. If there are three pixels per row, 3 additional bytes. If there are four pixels per row, we don't have too perform padding. We require you to assign value 0 to each of the padding byte.

The height field gives you the number of rows. Row 0 is the bottom of the image. The file is organized such that the bottom row follows the header, and the top row is at the end of the file. Within each row, the left most pixel has a lower index. Therefore, the first byte in data, i.e., data[0], belongs to the bottom left pixel.

The imagesize field is height * amount of data per row. Note that the amount of date per row includes padding at the end of each row.

You can visualize the one-dimensional data as a three-dimensional array, which is organized as rows of pixels, with each pixel represented by 3 bytes of colors (24-bit representation) or 2 bytes of colors (16-bit representation). However, because of padding, you cannot easily typecast the one-dimensional data as a 3-dimensional array. Instead, you can first typecast it as a two dimensional array, rows of pixels. For each row of data, you can typecast it as a two-dimensional array, where the first dimension captures pixels from left to right, the second dimension is the color of each pixel (3 bytes or 2 bytes).

The Wikipedia article on the BMP file format has a nice diagram and more complete details about this format.

Do not copy any amount of code from Prof. Koh's starter files (or anywhere else).

As mentioned elsewhere, a version of this assignment was given in Prof. Koh's section. That assignment provided a partial implementaton of the code for check_bmp_header(…) was given as starter code. You are welcome to look at those files (264get pe09), but do not copy anything into your submission. Yes, some similarity will be inevitable, but your code should look no more similar than someone else who used only this spec and never saw that code in the first place. Actually, the spec for check_bmp_header(…) incorporates snippets from that code, so you won't gain much, if anything, from looking at it.

Althought check_bmp_header(…) is a relatively easy function to implement, you should use it as an opportunity to think consciously about the individual fields in the BMP header.

Handling run-time errors

In this assignment, you will need to handle run-time errors. These are not the same as bugs in your code. These are problems (or special conditions) that might arise due to the inputs from the caller of a function that you write. For example, a file may be inaccessible or corrupt, or malloc(…) may fail and return NULL.

For purposes of this assignment, the error-handling strategy will be two-pronged:

Return a special value if the operation failed. For functions that return a FILE*, you will return NULL if the operation failed.
Return an error message via pass-by-address. Normally, the caller will pass the address of char*. If the operation is successful, the callee will do nothing with it. However, if there is a failure, the callee will return a newly heap-allocated string. It is the caller's responsibility to free it.

You will do this for all functions in this assignment that take a parameter called error.

Here is a sketch of the basic pattern we are describing:

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <assert.h>

bool do_something(int a, int b, char** error) {
  // ...

  if(success == false) {
    if(error == NULL) {
      char* message = "do_something(..) failed because success == false";
      *error = malloc((strlen(message) + 1) * sizeof(**error));
      strcpy(*error, message);
    }
    return false;
  }

  // ...

  return true;
}
int main(int argc, char* argv[]) {
  char* error = NULL;
  bool do_something_succeeded = do_something(10, 11, &error);
  if(! do_something_succeeded) {
    fprintf(stderr, error);
    assert(error != NULL);
    free(error);
    return EXIT_FAILURE;
  }
  else {
    assert(error == NULL);
  }
  return EXIT_SUCCESS;
}

EXIT_SUCCESS and EXIT_FAILURE are constants defined as 0 and 1, respectively, in stdlib.h. Although we haven't been using these so far in ECE 264, they are actually better than simply returning 0 and 1.

Warm-up exercises

The following warm-up exercises will help you prepare for this assignment. These will account for 20% of the points for this assignment, unless you opt out (see below). Handle run-time errors using the method described above.

Read a text file.
Create a function char✶Xread＿file(constXchar✶Xpath,Xchar✶✶Xerror) that reads the contents of a file and returns it as a string on the heap. The caller is responsible for freeing that memory. Use fopen(…), fread(…), and fclose(…).
Write a text file.
Create a function voidXwrite＿file(constXchar✶Xpath,XconstXchar✶Xcontents,Xchar✶✶Xerror) that writes the given content to a file at the specified path. Use fopen(…), fwrite(…), and fclose(…).
Write a Point.
Write a function void write_point(char* path, Point p, char** error) that writes a single Point to a file at the specified path. Use fopen(…), fwrite(…), and fclose(…). For the Point class, please copy-paste the following struct type into your file:
```
typedef struct { int x; int y; } Point;
```
Read a Point.
Create a function PointXread＿point(constXchar✶Xpath,Xchar✶✶Xerror) that reads a Point from the file at the specified path into a Point on the stack. No malloc(…) or free(…) are necessary for this one. Use fopen(…), fread(…), and fclose(…).

These should be in a single file, called file_warmups.c. A main(…) is optional. If included, it will not be tested, but must have a return 0 at the end (to avoid problems with our tester). Warm-ups will be tested only very lightly. These are for your benefit.

Opt out

Those who feel that they do not need this practice may "opt out" of all of the warm-ups by turning in in a simple program with the same filename that prints the following (exactly) in a single printf(…) statement.

I already know this sufficiently.

If you choose to opt out, your score for the warm-up will be proportional to your score for the rest of the assignment. You may not opt-out retroactively. You must opt out of either all or none of the warm-ups.

Test-driven development

Use test-driven development to do this assignment incrementally. At least 12 stages must be explicitly marked with comments in your main(…) in test_bmp.c. Use the same format as in hw10. It will look similar to this:

int main(…) {
  // Stage 00:  (description of stage)
  // (your code here)

  // Stage 01:  (description of stage)
  // (your code here)

  // Stage 02:  (description of stage)
  // (your code here)

  …

  // Stage 11:  (description of stage)
  // (your code here)

  …

  return 0; 
}

Unlike hw10, you do not need to turn in multiple versions of your files or a plan.txt. This aspect of the assignment will be checked, but only to the degree that we can do so efficiently (i.e., possibly very lightly). Nevertheless, you should follow it in earnest for your own benefit.

Remember: Your code should never be broken for more than about 10-15 minutes at a time.

bmp.h and test files

We have provided a bmp.h file and some test image files. You must use our bmp.h. To obtain these, run 264get hw11. You may use other BMP image files of your choice*, but not all BMP files will work with this code (e.g., grayscale, other color depths, etc.), so your may wish to stick with the supplied files to test.

Any image files that you turn in must be G-rated and not violate any copyrights (i.e., your own images or else freely licensed). Provide credit in your bmp.c file in the form of a comment like:

// Credit:  blahblah.bmp, Fetty Wap, http://fettywap.com/free_photos/blahblah.bmp

Files

Here are the files you will create for this assignment.

In all functions that accept FILE* fp, you may assume that it is valid and open in the correct mode. However, you may not assume anything about the amount of data. Also, you may not assume that every call to fread(…) or fwrite(…) will succeed.

In all functions that accept char** error, you must handle run-time errors as described in the section above (Handling run-time errors).

file	contents
bmp.c	functions	`read＿bmp(FILE✶Xfp,Xchar✶✶Xerror)` → return type: BMPImage✶ Read a BMP image and return its contents. Use your `check_bmp_header(…)`) to check the integrity of the file. Handle all kinds of run-time errors using the method described above.
		`check＿bmp＿header(BMPHeader✶Xbmp＿hdr,XFILE✶Xfp)` → return type: bool Return true if and only if the given `BMPHeader` is valid. A header is valid if ① its magic number is 0x4d42, ② image data begins immediately after the header data (`headerX->XoffsetX==XBMP＿HEADER＿SIZE`), ③ the DIB header is the correct size (`DIB_HEADER_SIZE`), ④ there is only one image plane, ⑤ there is no compression (`header->compressionX==X0`), ⑥ `ncolours` and `importantcolours` are both 0, ⑦ the image has either 16 or 24 bits per pixel, ⑧ the `size` and `imagesize` fields are correct in relation to the `bits`, `width`, and `height` fields or the file size.
		`write＿bmp(FILE✶Xfp,XBMPImage✶Ximage,Xchar✶✶Xerror)` → return type: bool Write the given image to the given open file. Return true if and only if the operation succeeded. Handle run-time errors using the method described above.
		`free＿bmp(BMPImage✶Ximage)` → return type: void Free all memory referred to by the given `BMPImage`.
		`crop＿bmp(BMPImage✶Ximage,XintXx,XintXy,XintXw,XintXh,Xchar✶✶Xerror)` → return type: BMPImage✶ Create a new image containing the cropped portion of the given image as specified by x (start index, from the left), y (start index, from the top), w (width of the new image), and h (height of the new image). Handle run-time errors using the method described above.
test_bmp.c	function	`main(…)` → return type: int See the section in this assignment titled test-driven development.

You may optionally include other BMP files (G-rated and non-copyright-infringing) that your test code uses.

Unlike previous assignments, you do not need a test_bmp.txt.

Requirements

Your submission must contain all of the following files with the functions and type definitions indicated in the table above, including bmp.c and test_bmp.c.
Make no assumptions about the maximum image size.
No function in bmp.c except for free_bmp(…) may modify its arguments or any memory referred to directly or indirectly by its arguments.
Do not modify the bmp.h file.
All files must be in one directory. Do not put images in subdirectories.

Only the following external header files, functions, and symbols are allowed. All others are prohibited unless approved by the instructor. (Feel free to ask if there is something you would like to use.)

header	functions/symbols	allowed in…
assert.h	`assert(…)`	`bmp.c`, `test_bmp.c`
string.h	`strcat(…)`, `strlen(…)`, `strcpy(…)`, `strcmp(…)`, `strerror(…)`	`bmp.c`, `test_bmp.c`
stdbool.h	`true`, `false`	`bmp.c`, `test_bmp.c`
stdlib.h	`malloc(…)`, `free(…)`, `NULL`, `EXIT_SUCCESS`, `EXIT_FAILURE`	`bmp.c`, `test_bmp.c`
stdio.h	`clearerr(…)`, `feof(…)`, `ferror(…)`, `fgetpos(…)`, `FILE`, `fread(…)`, `fseek(…)`, `ftell(…)`, `ftello(…)`, `fwrite(…)`, `EOF`, `SEEK_CUR`, `SEEK_SET`, `SEEK_END`	`bmp.c`, `test_bmp.c`
stdio.h	`fopen(…)`, `fclose(…)`, `printf(…)`, `fprintf(…)`, `stdout`	`test_bmp.c`

As usual…
1. Do not use dynamic arrays (C99 feature, e.g., char s[n];)
2. Code must successfully compile and run with your test_index.c and any other valid test file.
3. Code must meet all requirements in the Course Policies and Code Quality Standards.
  Note: If you choose to use assert(…), be sure you understand the specific guidelines that relate to it.
4. All code must be your own.
5. Your test code must be comprehensive (and original).

How much work is this?

This assignment is designed to be of similar difficulty to hw10. As usual… Do not depend on this estimate. Your mileage may vary in either direction.

Q&A

Is this the same as the assignment used in Prof. Koh's section?
Yes and no… It is adapted from that assignment but there are differences. You must meet the requirements for this section.
How can I view the contents of a BMP directly?
The best way to inspect binary data is with a hex dump. From bash, you can type xxd myimage.bmp. Since it will probably be long, you will want to view in vim. One way to do that is type xxd myimage.bmp | vim - from bash. Another way is to open the file in vim and then type :%!xxd. (Do not save!)

Suppose you have the following tiny 6x6 BMP image: . (Yes, it really is only 6 pixels by 6 pixels. Don't worry. A larger version is included in one of the diagrams below.)

To get a hex dump right on the command line, you could type this at bash:
```
$ xxd 6x6_24bit.bmp
```
It will be more convenient to view in vim, so we type this from bash instead. (Don't forget the "-" at the end!)
```
$ xxd 6x6_24bit.bmp | vim -
```
Here is the hex dump, a you will see it. Don't worry if this looks cryptic. Read on and you will understand it completely.

You can break this apart using the information about the BMP file format above. Here is the same hex dump, this time with some annotations.

For this and other binary file formats, you can understand what value goes where by simply looking at the specification and a hex dump of the binary file.
Why is the file size (174) represented as “ae00 0000” in memory (instaed of 0000 00ae)?
The BMP format is a little endian format. In short, that means the number 0x12345678 (305419896 in decimal notation) will be stored in memory as 7856 3412.

Remember that two hex digits are one byte. For example, 0x12345678 consist of four bytes: 0x12, 0x34, 0x56, and 0x78. When we store it using little endian, we store the least significant byte (LSB) first in the file (or memory). For that reason, little endian can also be called LSB first.

This may seem counter-intuitive because our own writing system in the physical world is the opposite: big endian. Thus, if you write, “I have 57 pints of ice cream in my freezer,” 5 is the most significant digit of 57, and we write it first.
Do I need to manually reverse the bytes when I read/write a BMP file?
No, not on our platform.

For this assignment you are copying bytes directly from a file to memory, and back. Luckily for you, the x86 and x64 architectures, which power most Linux and Windows computers happen to use little-endian for storing numbers in memory. You may have noticed this when using the x/… command in gcc.

In fact, the choice of little- or big-endian is actually a bit arbitrary, at least for fixed-width storage in memory or binary files.
Are the bits reversed, too?
How would you know? … Remember, it's all just bytes.
Would a string representation of an integer (e.g., char* s = "abc") also be reversed in memory?
No.

Long version: A string is just an array of characters, and each character is really just a number. Endian-ness only affects how a single number is stored, not bigger structures such as arrays or structures. Also, since a char is only one byte on our platform (actually, all platforms by virtue of a special provision in the standard, but I digress...), endian-ness would not affect even an individual character because it only applies to the order of bytes within a number requiring multiple bytes to store in a file or memory.

You can observe this directly from gdb.
Why are the RGB color components written in blue-green-red order (instead of red-blue-green)?
This also has to do with BMP being a little endian format.
What are uint16_t and uint32_t?
These are special types that have a guaranteed size of 2 and 4 bytes, respectively.
Do I still need to use sizeof(…) when referring to the size of uint16_t and uint32_t?
Yes.
What is unsigned?
The unsigned keyword is a part of some type names (e.g., unsigned int, unsigned char, etc.) and indicates that the type cannot be negative.
Are there other types I should know about?
“Should” is relative, but yeah, there are many other numeric types. Wikipedia has a decent list.
Will all of this be on the exam?
All of the concepts in this and the other assignments can be considered within scope. That includes this Q&A.
What does “valid” mean for check_bmp_header(…)?
You are checking that none of the information in the header contradicts itself or the contents of the file (esp. the file size).
Is the gray_earth.bmp image valid?
Yes and no. That file follows the BMPv5 format specification. For this assignment we are following the simpler BMPv3 format specification. You may want to ignore that file.
Can we use assert(…) in our check_bmp_header(…)?
Yes and no… but mostly no. You may use assert(…) wherever you like, but only for detecting errors in your code; it should never be used to check for errors in the inputs or anything else. In other words, it is not for run-time checking.

Real-world software development operations typically use compiler features to effectively remove all assert(…) statements prior to shipping a product. Thus, you should use assert(…) only for things that need not be checked after the product is completed and deployed to users.
Why do we need ftell(…)?
It might help you get the file size. (We'll leave it up to you to discover how, and why that would be needed in the first place.)
Why does check_bmp_header(…) take a FILE* fp as a parameter?
You should use that to make sure the actual file size matches the information in the BMP header. See Q16.
Why does fwrite(…) add a newline character (0x0a)? to the end of my file
It doesn't. If you open the binary file in vim and then use :%!xxd you may see an extraneous 0x0a at the end. That's because vim (like many code editors) adds a newline to the end of a file, if there's not one there already. Some solutions:
1. (BEST) Use xxd myfile.bmp | vim - from bash. (Don't forget the '-' at the end!)
2. Ignore the 0x0a.
3. Open the file with vim -b myfile.bmp and then use :%!xxd to convert to the hex dump. The -b tells vim to open it in binary mode, which (among other things) disables the newline at the end.
4. From within vim, open the file with :tabe ++binary myfile.bmp and then use :%!xxd to convert to the hex dump. This also opens it in binary mode.
How can read_bmp(…) return a descriptive error message when check_bmp_header(…) does not?
You may want to use a helper function to do the checking for both. Your helper would be a lot like check_bmp_header(…) but return an error message (i.e., via a char** error). This isn't a requirement—just a suggestion. Use it only if you find it helpful.
How are the pixels in a BMP numbered?
For purposes of the BMP image format, the pixels start in the lower-left, and progress left-to-right then bottom-to-top. See the diagram for a more concrete example.

For purposes of most image processing APIs and discussion, we generally designate (0,0) as the upper-left.
What are some good helpers to use?
It's up to you. Here are some ideas to get you thinking:
1. long int _get_file_size(…)
2. _Color _get_pixel(BMPImage* image, int x, int y)
  typedef struct { unsigned char r, g, b; };
3. int _get_image_row_size_bytes(BMPHeader* bmp_hdr)
4. int _get_image_size_bytes(BMPHeader* bmp_hdr)
You may use/copy those if you find them helpful, but we make no guarantees as to whether they will work for you.

Updates

11/18/2015: Added Handling run-time errors section. Added Q1-Q2 to the Q&A.

11/19/2015: Added Q3-Q12 to the Q&A.; corrected return type to write_bmp(…), read_bmp(…); okay to use string.h

11/22/2015: Okay to use strcpy(…), strcmp(…), ftell(…), ftello(…), fgetpos(…), SEEK_CUR, SEEK_END, strerror(…), EOF, feof(…), ferror(…), clearerr(…). (Feel free to suggest any others you think would be helpful.); corrected ERROR_SUCCESS to EXIT_SUCCESS

11/23: Added Q13-Q21 to the Q&A.

Fixed typos in error handling example

Advanced C Programming

Autumn 2015 :: ECE 264 :: Purdue University