| |
- builtins.object
-
- RegionProposalGenerator
class RegionProposalGenerator(builtins.object) |
|
RegionProposalGenerator(*args, **kwargs)
|
|
Methods defined here:
- __init__(self, *args, **kwargs)
- Initialize self. See help(type(self)) for accurate signature.
- accessing_one_color_plane(self, image_file, n)
- This method shows how can access the n-th color plane of the argument color image.
- convolutions_with_pytorch(self, image_file, kernel)
- Using torch.nn.functional.conv2d() for demonstrating a single image convolution with
a specified kernel
- displayImage(self, argimage, title='')
- Displays the argument image. The display stays on for the number of seconds
that is the first argument in the call to tk.after() divided by 1000.
- displayImage2(self, argimage, title='')
- Displays the argument image. The display stays on until the user closes the
window. If you want a display that automatically shuts off after a certain
number of seconds, use the previous method displayImage().
- displayImage3(self, argimage, title='')
- Displays the argument image (which must be of type Image) in its actual size. The
display stays on until the user closes the window. If you want a display that
automatically shuts off after a certain number of seconds, use the method
displayImage().
- displayImage4(self, argimage, title='')
- Displays the argument image (which must be of type Image) in its actual size without
imposing the constraint that the larger dimension of the image be at most half the
corresponding screen dimension.
- displayImage5(self, argimage, title='')
- This does the same thing as displayImage4() except that it also provides for
"save" and "exit" buttons. This method displays the argument image with more
liberal sizing constraints than the previous methods. This method is
recommended for showing a composite of all the segmented objects, with each
object displayed separately. Note that 'argimage' must be of type Image.
- displayImage6(self, argimage, title='')
- For the argimge which must be of type PIL.Image, this does the same thing as
displayImage3() except that it also provides for "save" and "exit" buttons.
- display_tensor_as_image(self, tensor, title='')
- This method converts the argument tensor into a photo image that you can display
in your terminal screen. It can convert tensors of three different shapes
into images: (3,H,W), (1,H,W), and (H,W), where H, for height, stands for the
number of pixels in the vertical direction and W, for width, for the same
along the horizontal direction. When the first element of the shape is 3,
that means that the tensor represents a color image in which each pixel in
the (H,W) plane has three values for the three color channels. On the other
hand, when the first element is 1, that stands for a tensor that will be
shown as a grayscale image. And when the shape is just (H,W), that is
automatically taken to be for a grayscale image.
- display_tensor_as_image2(self, tensor, title='')
- This method converts the argument tensor into a photo image that you can display
in your terminal screen. It can convert tensors of three different shapes
into images: (3,H,W), (1,H,W), and (H,W), where H, for height, stands for the
number of pixels in the vertical direction and W, for width, for the same
along the horizontal direction. When the first element of the shape is 3,
that means that the tensor represents a color image in which each pixel in
the (H,W) plane has three values for the three color channels. On the other
hand, when the first element is 1, that stands for a tensor that will be
shown as a grayscale image. And when the shape is just (H,W), that is
automatically taken to be for a grayscale image.
- displaying_and_histogramming_images_in_batch1(self, dir_name, batch_size)
- This method is the first of three such methods in this module for illustrating the
functionality of matplotlib for simultaneously displaying multiple images and
the results obtained on them in gridded arrangements. In the implementation
shown below, the core idea in this method is to call
"plt.subplots(2,batch_size)" to create 'batch_size' number of subplot
objects, called "axes", in the form of a '2xbatch_size' array. We use the first
row of this grid to display each image in its own subplot object. And we use
the second row the grid to display the histogram of the corresponding image
in the first row.
- displaying_and_histogramming_images_in_batch2(self, dir_name, batch_size)
- I now show a second approach to display multiple images and their corresponding
histograms in a gridded display. Unlike in the previous implementation of
this method, now we do not call on "plt.subplots()" to create a grid
structure for displaying the images. On the other hand, we now call on
"torchvision.utils.make_grid()" to construct a grid for us. The grid is
created by giving an argument like "nrow=4" to it. When using this method,
an important thing to keep in mind is that the first argument to make_grip()
must be a tensor of shape "(B, C, H, W)" where B stands for batch_size, C for
channels (3 for color, 1 for gray), and (H,W) for the height and width of the
image. What that means in our example is that we need to synthesize a tensor
of shape "(8,1,64,64)" in order to be able to call the "make_grid()"
function. Note that the object returned by the call to make_grid() is a
tensor unto itself. For the example shown, if we had called
"print(grid.shape)" on the "grid" returned by "make_grid()", the answer would
be "torch.Size([3, 158, 306])" which, after it is converted into a numpy
array, can be construed by a plotting function as a color image of size
158x306.
- displaying_and_histogramming_images_in_batch3(self, dir_name, batch_size)
- The core idea here is to illustrate two things: (1) The syntax used for the
'singular' version of the subplot function "plt.subplot()" --- although I'll
be doing so by actually calling "fig.add_subplot()". And (2) How you can put
together multiple multi-image plots by creating multiple Figure objects.
Figure is the top-level container of plots in matplotlib. In the
implementation shown below, the key statements are:
fig1 = plt.figure(1)
axis = fig1.add_subplot(241)
Calling "add_subplot()" on a Figure object returns an "axis" object. The
word "axis" is a misnomer for what should really be called a "subplot".
Subsequently, you can call display functions lime "imshow()", "bar()", etc.,
on the axis object to display an individual plot in a gridded arrangement.
The argument "241" in the first call to "add_subplot()" means that your
larger goal is to create a 2x4 display of plots and that you are supplying
the 1st plot for that grid. Similarly, the argument "242" in the next call
to "add_subplot()" means that for your goal of creating a 2x4 gridded
arrangement of plots, you are now supplying the second plot. Along the same
lines, the argument "248" toward the end of the code block that you are now
supplying the 8th plot for the 2x4 arrangement of plots.
Note how we create a second Figure object in the second major code block. We
use it to display the histograms for each of the images shown in the first
Figure object. The two Figure containers will be shown in two separate
windows on your laptop screen.
- extract_data_pixels_in_bb(self, image_file, bounding_box)
- Mainly used for testing
- extract_image_region_interactively_by_dragging_mouse(self, image_name)
- This is one method you can use to apply selective search algorithm to just a
portion of your image. This method extract the portion you want. You click
at the upper left corner of the rectangular portion of the image you are
interested in and you then drag the mouse pointer to the lower right corner.
Make sure that you click on "save" and "exit" after you have delineated the
area.
- extract_image_region_interactively_through_mouse_clicks(self, image_file)
- This method allows a user to use a sequence of mouse clicks in order to specify a
region of the input image that should be subject to further processing. The
mouse clicks taken together define a polygon. The method encloses the
polygonal region by a minimum bounding rectangle, which then becomes the new
input image for the rest of processing.
- extract_rectangular_masked_segment_of_image(self, horiz_start, horiz_end, vert_start, vert_end)
- Keep in mind the following convention used in the PIL's Image class: the first
coordinate in the args supplied to the getpixel() and putpixel() methods is for
the horizontal axis (the x-axis, if you will) and the second coordinate for the
vertical axis (the y-axis). On the other hand, in the args supplied to the
array and matrix processing functions, the first coordinate is for the row
index (meaning the vertical) and the second coordinate for the column index
(meaning the horizontal). In what follows, I use the index 'i' with its
positive direction going down for the vertical coordinate and the index 'j'
with its positive direction going to the right as the horizontal coordinate.
The origin is at the upper left corner of the image.
- gaussian_smooth(self, pil_grayscale_image)
- This method smooths an image with a Gaussian of specified sigma.
- graph_based_segmentation(self, image_name, num_blobs_wanted=None)
- This is an implementation of the Felzenszwalb and Huttenlocher algorithm for
graph-based segmentation of images. At the moment, it is limited to working
on grayscale images.
- graph_based_segmentation_for_arrays(self, which_one)
- This method is provided to enable the user to play with small arrays when
experimenting with graph-based logic for image segmentation. At the moment, it
provides three small arrays, one under the "which_one==1" option, one under the
"which_one==2" option, and the last under the "which_one==3" option.
- graying_resizing_binarizing(self, image_file, polarity=1, area_threshold=0, min_brightness_level=100)
- This is a demonstration of some of the more basic and commonly used image
transformations from the torchvision.transformations module. The large comments
blocks are meant to serve as tutorial introduction to the syntax used for invoking
these transformations. The transformations shown can be used for converting a
color image into a grayscale image, for resizing an image, for converting a
PIL.Image into a tensor and a tensor back into an PIL.Image object, and so on.
- histogramming_and_thresholding(self, image_file)
- PyTorch based experiments with histogramming and thresholding
- histogramming_the_image(self, image_file)
- PyTorch based experiments with histogramming the grayscale and the color values in an
image
- repair_blobs(self, merged_blobs, color_map, all_pairwise_similarities)
- The goal here to do a final clean-up of the blob by merging tiny pixel blobs with
an immediate neighbor, etc. Such a cleanup requires adjacency info regarding the
blobs in order to figure out which larger blob to merge a small blob with.
- selective_search_for_region_proposals(self, graph, image_name)
- This method implements the Selective Search (SS) algorithm proposed by Uijlings,
van de Sande, Gevers, and Smeulders for creating region proposals for object
detection. As mentioned elsewhere here, that algorithm sits on top of the graph
based image segmentation algorithm that was proposed by Felzenszwalb and
Huttenlocher. The parameter 'pixel_blobs' required by the method presented here
is supposed to be the pixel blobs produced by the Felzenszwalb and Huttenlocher
algorithm.
- visualize_segmentation_in_pseudocolor(self, pixel_blobs, color_map, label='')
- Assigns a random color to each blob in the output of an image segmentation algorithm
- visualize_segmentation_with_mean_gray(self, pixel_blobs, label='')
- Assigns the mean color to each each blob in the output of an image segmentation algorithm
- working_with_hsv_color_space(self, image_file, test=False)
- Shows color image conversion to HSV
Data descriptors defined here:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
Data and other attributes defined here:
- PurdueDrEvalDataset = <class 'RegionProposalGenerator.RegionProposalGenerator.PurdueDrEvalDataset'>
- This is the dataset to use if you are experimenting with single-instance object
detection. The dataset contains three kinds of objects in its images:
Dr. Eval, and two "objects" in his neighborhood: a house and a watertower.
Each 128x128 image in the dataset contains one of these objects after it is
randomly scaled and colored and substantial structured noise in addition to
20% Gaussian noise. Examples of these images are shown in Week 8 lecture
material in Purdue's Deep Learning class.
In order to understand the implementation of the dataloader for the Dr Eval
dataset for single-instance-based object detection, note that the top-level
directory for the dataset is organized as follows:
dataroot
|
|
______________________________________________________________________
| | | | | |
| | | | | |
Dr_Eval house watertower mask_Dr_Eval mask_house mask_watertower
| | | | | |
| | | | | |
images images images binary images binary images binary images
As you can see, the three main image directories are Dr_Eval, house, and
watertower. For each image in each of these directories, the mask for the
object of interest is supplied in the corresponding directory whose name
carries the prefix 'mask'.
For example, if you have an image named 29.jpg in the Dr_Eval directory, you
will have an image of the same name in the mask_Dr_Eval directory that will
just be the mask for the Dr_Eval object in the former image
As you can see, the dataset does not directly provide the bounding boxes for
object localization. So the implementation of the __getitem__() function in
the dataloader must include code that calculates the bounding boxes from the
masks. This you can see in the definition of the dataloader shown below.
Since this is a ``non-standard'' organization of the of data, the dataloader
must also provide for the indexing of the images so that they can be subject
to a fresh randomization that is carried out by PyTorch's
torch.utils.data.DataLoader class for each epoch of training. The
index_dataset() function is provided for that purpose.
After the dataset is downloaded for the first time, the index_dataset()
function stores away the information as a PyTorch ``.pt'' file so that it can
be downloaded almost instantaneously at subsequent attempts.
One final note about the dataset: Under the hood, the dataset consists of the
pathnames to the image files --- and NOT the images themselves. It is the
job of the multi-threaded ``workers'' provided by torch.utils.data.DataLoader
to actually download the images from those pathnames.
- PurdueDrEvalMultiDataset = <class 'RegionProposalGenerator.RegionProposalGenerator.PurdueDrEvalMultiDataset'>
- This is the dataset to use if you are experimenting with multi-instance object
detection. As with the previous dataset, it contains three kinds of objects
in its images: Dr. Eval, and two "objects" in his neighborhood: a house and a
watertower. Each 128x128 image in the dataset contains up to 5 instances of
these objects. The instances are randomly scaled and colored and exact number
of instances in each image is also chosen randomly. Subsequently, background
clutter is added to the images --- these are again randomly chosen
shapes. The number of clutter objects is also chosen randomly but cannot
exceed 10. In addition to the structured clutter, I add 20% Gaussian noise
to each image. Examples of these images are shown in Week 8 lecture material
in Purdue's Deep Learning class.
On account of the much richer structure of the image annotations, this
dataset is organized very differently from the previous one:
dataroot
|
|
___________________________
| |
| |
annotations.p images
Since each image is allowed to contain instances of the three different types
of "meaningful" objects, it is not possible to organize the images on the
basis of what they contain.
As for the annotations, the annotation for each 128x128 image is a dictionary
that contains information related to all the object instances in the image. Here
is an example of the annotation for an image that has three instances in it:
annotation: {'filename': None,
'num_objects': 3,
'bboxes': {0: (67, 72, 83, 118),
1: (65, 2, 93, 26),
2: (16, 68, 53, 122),
3: None,
4: None},
'bbox_labels': {0: 'Dr_Eval',
1: 'house',
2: 'watertower',
3: None,
4: None},
'seg_masks': {0: <PIL.Image.Image image mode=1 size=128x128 at 0x7F5A06C838E0>,
1: <PIL.Image.Image image mode=1 size=128x128 at 0x7F5A06C837F0>,
2: <PIL.Image.Image image mode=1 size=128x128 at 0x7F5A06C838B0>,
3: None,
4: None}
}
The annotations for the individual images are stored in a global Python
dictionary called 'all_annotations' whose keys consist of the pathnames to
the individual image files and the values the annotations dict for the
corresponding images. The filename shown above in the keystroke diagram,
'annotations.p' is what you get by calling 'pickle.dump()' on the
'all_annotations' dictionary.
- RPN = <class 'RegionProposalGenerator.RegionProposalGenerator.RPN'>
- I have not yet mentioned this class in the documentation page for this module
because its implementation is not finished.
- SingleInstanceDetector = <class 'RegionProposalGenerator.RegionProposalGenerator.SingleInstanceDetector'>
- This class demonstrates single-instance object detection on the images in the
PurdueDrEvalDataset dataset. Although these image are complex, in the sense
that each image contains multiple clutter objects in addition to random
noise, nonetheless we know that each image contains only a single meaningful
object instance. The LOADnet network used for detection is adaptation of the
the LOADnet2 network from DLStudio to the case of 128x128 sized input images.
The LOADnet network uses the SkipBlock as a building-block element for
dealing the problems caused by vanishing gradients.
- YoloLikeDetector = <class 'RegionProposalGenerator.RegionProposalGenerator.YoloLikeDetector'>
- The primary purpose of this class is to demonstrate multi-instance object detection with YOLO-like
logic. A key parameter of the logic for YOLO-like detection is the variable 'yolo_interval'.
The image gridding that is required is based on the value assigned to this variable. The grid is
represented by an SxS array of cells where S is the image width divided by yolo_interval. So for
images of size 128x128 and 'yolo_interval=20', you will get a 6x6 grid of cells over the image. Since
my goal is merely to explain the principles of the YOLO logic, I have not bothered with the bottom
8 rows and the right-most 8 columns of the image that get left out of the area covered by such a grid.
An important element of the YOLO logic is defining a set of Anchor Boxes for each cell in the SxS
grid. The anchor boxes are characterized by their aspect ratios. By aspect ratio I mean the
'height/width' characterization of the boxes. My implementation provides for 5 anchor boxes for
each cell with the following aspect ratios: 1/5, 1/3, 1/1, 3/1, 5/1.
At training time, each instance in the image is assigned to that cell whose central pixel is
closest to the center of the bounding box for the instance. After the cell assignment, the
instance is assigned to that anchor box whose aspect ratio comes closest to matching the aspect
ratio of the instance.
The assigning of an object instance to a <cell, anchor_box> pair is encoded in the form of a
'5+C' element long YOLO vector where C is the number of classes for the object instances.
In our cases, C is 3 for the three classes 'Dr_Eval', 'house' and 'watertower', therefore we
end up with an 8-element vector encoding when we assign an instance to a <cell, anchor_box>
pair. The last C elements of the encoding vector can be thought as a one-hot representation
of the class label for the instance.
The first five elements of the vector encoding for each anchor box in a cell are set as follows:
The first element is set to 1 if an object instance was actually assigned to that anchor box.
The next two elements are the (x,y) displacements of the center of the actual bounding box
for the object instance vis-a-vis the center of the cell. These two displacements are expressed
as a fraction of the width and the height of the cell. The next two elements of the YOLO vector
are the actual height and the actual width of the true bounding box for the instance in question
as a multiple of the cell dimension.
The 8-element YOLO vectors are packed into a YOLO tensor of shape (num_cells, num_anch_boxes, 8)
where num_cell is 36 for a 6x6 gridding of an image, num_anch_boxes is 5.
Classpath: RegionProposalGenerator -> YoloLikeDetector
- canvas = None
- drawEnable = 0
- region_mark_coords = {}
- startX = 0
- startY = 0
| |