Parallel adaptive threshold
Goals
- Learn to write parallel programs using multiple threads and the pthread library.
- Increase your depth in image programming.
Overview
You will create binarize(…)
, a function
that works with your code from HW13 to convert an image to
monochrome (black & white) using the “adaptive threshold” method.
Adaptive Thresholding
Thresholding is a technique used to convert an image to monochrome (black & white). It is important for problems such as scanned document processing, OCR, image segmentation, and medical imaging. Pixels brighter than some threshold are set to white. The rest are set to black. The key question becomes this: What should the threshold be?
If the image is color, you will need to calculate what each pixel would be as grayscale. In the image below (taken with a smartphone), the image at left is actually color. Notice that there is a little tint from the lighting in the room. The image right is true grayscale. Your code should be able to handle either one.
color |
gray |
In the simplest case, you could use 50% gray ( ) so that would be converted to . That method—often called fixed threshold because it uses the same threshold value for all pixels—generally works well for scanned documents, as long as the brightness and contrast were set properly when they were scanned. However, in cases of photographs taken in uneven lighting or certain medical applications, a fixed threshold provides poor results, as you see below.
gray |
monochrome, fixed threshold |
Even if we were to use 30% gray or 70% gray, we would still have similar problems. Because the lighting is different in different regions of the image, there is no one good threshold to use for every pixel. The solution is adaptive threshold. The basic idea is that for each pixel in an image, you may use a different threshold value to decide whether that pixel should be black or white.
gray |
monochrome, adaptive threshold |
There are many good ways to decide the threshold, none of them perfect. For this assignment, we will keep it simple:
For a neighborhood radius r, a pixel at (x0, y0) shall be white if and only if its intensity is greater than the average intensity of all pixels in the (2r + 1) × (2r + 1) neighborhood surrounding that pixel. That neighborhood shall consist of all pixels at (xi, yi), such that |xi - x0| ≤ r, including the current pixel being thresholded. Pixels that are not white shall be black. In calculating the threshold, do not include pixels that are beyond the boundaries of the image. This means that for such edge pixels, the neighborhood will be smaller.
Example: Suppose we binarize the following 6x6 image with radius=1.
250 | 120 | 0 | 20 | 240 | 100 |
220 | 110 | 10 | 30 | 230 | 130 |
40 | 45 | 50 | 55 | 60 | 65 |
70 | 75 | 80 | 85 | 90 | 95 |
155 | 145 | 135 | 125 | 115 | 105 |
165 | 175 | 185 | 195 | 205 | 215 |
The pixel at (x=2, y=3) currently has intensity value of 80. The neighborhood (radius=1) consists of pixels having intensity values 45, 50, 55, 75, 80, 85, 145, 135, and 125. The average of those is 88⅓. Since 80 ≤ 88⅓, the pixel at (x=2, y=3) shall be black.
Always calculate the threshold based on only the values in the input image.
(Optional) More information about adaptive threshold:
Multi-threaded programming
Calculating adaptive threshold can be computationally intensive, especially on high resolution images. Therefore, you will implement the adaptive threshold algorithm in parallel by using the pthread library. How can parallelism be implemented for this algorithm? Each pixel in the output image must have its threshold calculated and its intensity value set. Threads can be created such that each thread operates on a particular region of pixels in the image. How you choose to allocate pixels to a particular thread is up to you!
Your binarize(…)
will create a copy, which makes your life a lot easier than it would be if they were modifying the image in place.
A thread is defined as an independent stream of instructions that can be scheduled to run by the operating system. Multi-threading allows multiple threads to exist within the context of a single process. These threads share a process' resources but are able to execute simultaneously and/or independently.
You are responsible for creation of num_threads
threads and their management. See the man pages for pthread_create(…)
and pthread_join(…)
for details on their parameters
and how to pass data to your worker function. You can refer to example#1 and example#2
covered in class. Also for more examples of thread management you can refer here.
Warm-up exercises
This assignment includes a warm-up exercise to help you get ready. This accounts for 10% of your score for HW14. Scoring will be relatively light, but the usual base requirements apply.
Zero an array. Create a function
that sets every element of the given array to zero using bool zero array(int✶ array, int size, int num threads, char✶✶ error)
num_threads
threads.
Opt out.
In a hurry, and don't need the practice? This warm-up is here to help you learn what you need to succeed on the rest of this assignment—not to add additional work. Therefore, we give you an option. Those who feel that they do not need the practice may "opt out". by modifying warmup.c so that it does nothing but print the following message exactly and then exit:
I already know this and do not need to practice.
If you do that, then your score for HW14 will be based solely on the rest of this assignment. If you leave the warmup.c undone, if you do not turn in a warmup.c, or if the message it prints does not match perfectly, then you will receive 0 for the warmup portion of this assignment (10%).
Test-driven development
Use test-driven development to do this assignment incrementally. Follow the same instructions as in HW13.
Run-time error handling
Throughout this assignment (including the warm-ups), all functions
that accept char** error
must
handle run-time errors as described in the section titled
Handling run-time errors in HW13.
Start
To get the starter files, enter
264get hw14
in bash.
You will get the header files (bmp.h and mtat.h), a skeleton warmup.c,
and several image files. The image files with "_bw_" in the filename are the
output of the binarize(…)
function, using the
given radius.
You are strongly encouraged to work with the smallest image (6x6), using only 1, 2, or 3 threads. Keeping things simple and small will make your development and debugging more manageable.
Requirements
- Your submission must contain each of the following files, as specified:
- Handle run-time errors using the method described in HW13.
- Input
BMPImage
should be a 24-bit BMP (v3) image. - Output
BMPImage
should be a 24-bit BMP (v3) with red=blue=green. - In the output, the intensity value of red, blue, and green must be either 0 (black) or 255 (white).
- Do not use division in calculating the new color intensity. (See Q2.) This is to ensure that results will be predictable and consistent.
- You may assume
num_threads > 0
. (There is no upper bound.) - Your test code may rely on any of the starter files.
- Your test code must work with only the image files that are included in the starter files for HW14.
- Use
read_bmp(…)
,write_bmp(…)
,free_bmp(…)
from HW13, as needed. - Your
main(…)
must return EXIT_SUCCESS (0). - If successful, return
true
and do not modify error. - In case of the following run-time errors, return
false
and set*error
to a descriptive error message.size <= 0
array == NULL
- thread could not be created
- If
size >= 1
andarray != NULL
, you may assume the array is valid and contains sizeint
values - If
num_threads
>size
, changenum_threads
tosize
of the array. - Message should be on the heap; caller is responsible for freeing.
- Do not turn in any image files.
- You may assume that images contain no more than 16,777,216 total pixels (e.g., 4096x4096).
-
binarize(…)
should not modify the image that is passed to it. - You may modify mtat.h but not bmp.h. Do not modify the function signature
of
binarize(…)
. -
Only the following externally defined functions and constants are allowed in your .c files. (You may put the corresponding #include <…> statements in the .c file or in your mtat.h, at your option.)
This does
not apply to your bmp.c and bmp.h. "*" means you may use anything in that
header file.
header functions/symbols allowed in… assert.h assert(…)
mtat.c
,test_mtat.c
,warmup.c
math.h *
mtat.c
,test_mtat.c
,warmup.c
pthread.h *
mtat.c
,test_mtat.c
,warmup.c
stdbool.h true
,false
mtat.c
,test_mtat.c
,warmup.c
stdlib.h malloc(…)
,free(…)
,qsort(…)
,NULL
,EXIT_SUCCESS
,EXIT_FAILURE
mtat.c
,test_mtat.c
,warmup.c
stdio.h FILE
mtat.c
,test_mtat.c
,warmup.c
stdio.h printf(…)
,fprintf(…)
,stdout
test_mtat.c
string.h strcat(…)
,strlen(…)
,strcpy(…)
,strcmp(…)
,strerror(…)
,memcpy(…)
mtat.c
,test_mtat.c
,warmup.c
- Submissions must meet the code quality standards and the policies on homework and academic integrity.
file | contents | |
---|---|---|
mtat.c | functions |
binarize(BMPImage✶ image, int radius, int num threads, char✶✶ error)
→ return type: BMPImage✶
Convert the given image to monochrome (black & white) using
the adaptive threshold algorithm described above.
|
mtat.h | declarations |
declaration of binarize(…)
|
bmp.h | declarations |
same as previous assignment
|
bmp.c | functions |
same as previous assignment
|
test_mtat.c | functions |
main(int argc, char✶ argv[])
→ return type: int
Test your binarize(…) .
|
warmup.c | functions |
zero array(int✶ array, int size, int num threads, char✶✶ error)
→ return type: bool
Set every element of array to 0 using num_threads threads.
|
Compile
This assignment relies on your code from HW13 but they must be in separate files. Do not copy your HW13 code into mtat.c or mtat.h. Your code should work with any working HW13 code. Your mtat.c must #include both mtat.h and bmp.h.
To compile thread that uses the pthread library, you must include the -pthread flag.
The following command should work:
gcc -pthread -o mtat mtat.c bmp.c test_mtat.c
Submit
In general, to submit any assignment for this course, you will use the following command:
264submit ASSIGNMENT FILES…
For HW14, you will type
264submit hw14 warmup.c mtat.c mtat.h bmp.c bmp.h test_mtat.c
from inside your hw14 directory.
You can submit as often as you want, even if you are not finished with the assignment. That saves a backup copy which we can retrieve for you if you ever have a problem.
How much work is this?
This assignment does not require a lot of code. However, writing code that uses threads is different from most of what you have done in ECE 264. In particular, it can be challenging to debug. Give yourself time to understand multi-threaded programming.
The instructor's solution contains 94 sloc in mtat.c, including
30 sloc in binarize(…)
+
33 sloc in the worker function (invoked via pthread_create(…)
) +
7 sloc in 3 other helper functions +
24 sloc for function headers and declarations.
There are 184 sloc in bmp.c, including
33 sloc in read_bmp(…)
+
13 sloc in write_bmp(…)
+
6 sloc in free_bmp(…)
+
23 sloc in _check_bmp_header(…)
(see Q18 in HW13) +
1 sloc in check_bmp_header(…)
(calls _check_bmp_header(…)
) +
25 sloc in crop_bmp(…)
+
43 sloc in 5 other helpers +
40 sloc for function headers and declarations. Altogether, 31 distinct error conditions
are checked and handled.
To measure your code, type
cloc --by-file *.c
Bonus #1: Median filter (2 bonus points)
Add a multi-threaded median filter using a very similar technique to the
adaptive threshold. Do not use bubble sort or any other
n2 algorithm to calculate the median. You may
however use qsort(…)
.
Your function signature will look like this:
BMPImage✶ median(BMPImage✶ image, int radius, int num threads, char✶✶ error)
To indicate that you are attempting bonus #1, you must include the following at the top of your mtat.h file.
#define HW14_BONUS_1
Bonus #2: Constant time median filter (4 bonus points)
Two additional bonus points are possible for anybody implementing this constant time median filter algorithm described in this article by Simon Perreault & Patrick Hébert. Other very efficient algorithms may be considered for this, if you obtain approval from the instructor in advance of the submission deadline.
To indicate that you are attempting bonus #2, you must include the following at the top of your mtat.h file.
#define HW14_BONUS_2
You may receive credit for either bonus #1 or bonus #2 but not both. In addition, we may require an in-person walk-through of your work for either of these (and especially #2).
Policies: Submit the bonus with the rest of the assignment using 264submit
.
The bonus and the main assignment will be scored separately. Late penalties will
not apply to the bonus, but it will not be considered >5 days after the
HW14 deadline.
This bonus will most likely be scored by hand at the end of the semester.
Scoring will be all or nothing for the bonuses. Include some sort of test
to demonstrate that it works. As long as you have done a reasonable amount
of testing, and your code passes your tests perfectly, you
will get the credit. In case of any confusion, we might ask you to come in
and explain how your code works. No credit will be given for
code that is only partially working, does not meet the base requirements
(in the syllabus), or comes with tests that are clearly inadequate.
Q&A
-
How much code should I be writing for
zero_array(…)
?
The core of it isn't much code. Here it is before adding in the error handling. (Do not copy.)
Adding in the error handling (as required) increases the amount of code. (Do not copy.)
-
How can I calculate the color intensity without using division?
First, let's look at how you would do this conceptually. You would convert each pixel to grayscale by computing the average of its red, blue, and green channels. That would yield a grayscale intensity between 0 and 255 for each pixel in the image. Then, for a given pixel, you would compare its grayscale intensity to the average of the grayscale intensities of the pixels in its neighborhood. Of course, computing these averages would use division. So how can we do it without division?Note that the sum of r+g+b for each cell is just 3 times the grayscale intensity. So, instead of dividing by 3, we will simply work with the r+g+b sum for every pixel. Likewise, instead of calculating the average within a neighborhood, we will multiply the value for the single pixel of interest by the number of pixels in its neighborhood and compare that with the sum of the r+g+b sums for every pixel in the neighborhood.Summary: Compare(r+g+b)*num_pixels_in_neighborhood
with the sum ofr+g+b
over all cells in the neighborhood. -
What if num_threads is greater than the number of pixels?
In that case, you may reduce num_threads to the number of pixels, just like you did forzero_array(…)
. This is not a "requirement", per ser, but it is reasonable and acceptable. -
How can I compare two BMP files?
Suppose we have two files, expected.bmp and actual.bmp. The most basic thing to do would be to use xxd to store the hex dump of each and then diff them.xxd expected.bmp > expected.bmp.txt xxd actual.bmp > actual.bmp.txt diff expected.bmp.txt actual.bmp.txt
We can do much better using a bash feature called process substitution, combined with a vim feature called vimdiff.vimdiff <(xxd expected.bmp) <(xxd actual.bmp)
To skip the header, and compare just the pixel data, add-s +54
to each xxd command:vimdiff <(xxd -s +54 expected.bmp) <(xxd -s +54 actual.bmp)
To look at only the header, add-l 54
to each xxd command:vimdiff <(xxd -l 54 expected.bmp) <(xxd -l 54 actual.bmp)
To see the difference at the command line, instead of in vim, substitutediff
in place ofvimdiff
in the above commands.
Updates
11/23/2016 | You can call malloc(…) from binarize(…) .binarize(…) should not modify the image that is passed to it
(but it may of course modify the error parameter that is passed to it). |
12/1/2016 | Added node about instructor solution code size. |
12/2/2016 | Added Q2 and note that you may assume num_threads ≥ 1 |