| |
- builtins.object
-
- YOLOLogic
class YOLOLogic(builtins.object) |
|
YOLOLogic(*args, **kwargs)
|
|
Methods defined here:
- __init__(self, *args, **kwargs)
- Initialize self. See help(type(self)) for accurate signature.
Data descriptors defined here:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
Data and other attributes defined here:
- PurdueDrEvalDataset = <class 'YOLOLogic.YOLOLogic.PurdueDrEvalDataset'>
- This is the dataset to use if you are experimenting with single-instance object
detection. The dataset contains three kinds of objects in its images:
Dr. Eval, and two "objects" in his neighborhood: a house and a watertower.
Each 128x128 image in the dataset contains one of these objects after it is
randomly scaled and colored and substantial structured noise in addition to
20% Gaussian noise. Examples of these images are shown in Week 8 lecture
material in Purdue's Deep Learning class.
In order to understand the implementation of the dataloader for the Dr Eval
dataset for single-instance-based object detection, note that the top-level
directory for the dataset is organized as follows:
dataroot
|
|
______________________________________________________________________
| | | | | |
| | | | | |
Dr_Eval house watertower mask_Dr_Eval mask_house mask_watertower
| | | | | |
| | | | | |
images images images binary images binary images binary images
As you can see, the three main image directories are Dr_Eval, house, and
watertower. For each image in each of these directories, the mask for the
object of interest is supplied in the corresponding directory whose name
carries the prefix 'mask'.
For example, if you have an image named 29.jpg in the Dr_Eval directory, you
will have an image of the same name in the mask_Dr_Eval directory that will
just be the mask for the Dr_Eval object in the former image
As you can see, the dataset does not directly provide the bounding boxes for
object localization. So the implementation of the __getitem__() function in
the dataloader must include code that calculates the bounding boxes from the
masks. This you can see in the definition of the dataloader shown below.
Since this is a ``non-standard'' organization of the of data, the dataloader
must also provide for the indexing of the images so that they can be subject
to a fresh randomization that is carried out by PyTorch's
torch.utils.data.DataLoader class for each epoch of training. The
index_dataset() function is provided for that purpose.
After the dataset is downloaded for the first time, the index_dataset()
function stores away the information as a PyTorch ``.pt'' file so that it can
be downloaded almost instantaneously at subsequent attempts.
One final note about the dataset: Under the hood, the dataset consists of the
pathnames to the image files --- and NOT the images themselves. It is the
job of the multi-threaded ``workers'' provided by torch.utils.data.DataLoader
to actually download the images from those pathnames.
- PurdueDrEvalMultiDataset = <class 'YOLOLogic.YOLOLogic.PurdueDrEvalMultiDataset'>
- This is the dataset to use if you are experimenting with multi-instance object
detection. As with the previous dataset, it contains three kinds of objects
in its images: Dr. Eval, and two "objects" in his neighborhood: a house and a
watertower. Each 128x128 image in the dataset contains up to 5 instances of
these objects. The instances are randomly scaled and colored and exact number
of instances in each image is also chosen randomly. Subsequently, background
clutter is added to the images --- these are again randomly chosen
shapes. The number of clutter objects is also chosen randomly but cannot
exceed 10. In addition to the structured clutter, I add 20% Gaussian noise
to each image. Examples of these images are shown in Week 8 lecture material
in Purdue's Deep Learning class.
On account of the much richer structure of the image annotations, this
dataset is organized very differently from the previous one:
dataroot
|
|
___________________________
| |
| |
annotations.p images
Since each image is allowed to contain instances of the three different types
of "meaningful" objects, it is not possible to organize the images on the
basis of what they contain.
As for the annotations, the annotation for each 128x128 image is a dictionary
that contains information related to all the object instances in the image. Here
is an example of the annotation for an image that has three instances in it:
annotation: {'filename': None,
'num_objects': 3,
'bboxes': {0: (67, 72, 83, 118),
1: (65, 2, 93, 26),
2: (16, 68, 53, 122),
3: None,
4: None},
'bbox_labels': {0: 'Dr_Eval',
1: 'house',
2: 'watertower',
3: None,
4: None},
'seg_masks': {0: <PIL.Image.Image image mode=1 size=128x128 at 0x7F5A06C838E0>,
1: <PIL.Image.Image image mode=1 size=128x128 at 0x7F5A06C837F0>,
2: <PIL.Image.Image image mode=1 size=128x128 at 0x7F5A06C838B0>,
3: None,
4: None}
}
The annotations for the individual images are stored in a global Python
dictionary called 'all_annotations' whose keys consist of the pathnames to
the individual image files and the values the annotations dict for the
corresponding images. The filename shown above in the keystroke diagram,
'annotations.p' is what you get by calling 'pickle.dump()' on the
'all_annotations' dictionary.
- RPN = <class 'YOLOLogic.YOLOLogic.RPN'>
- This class is meant specifically for experimenting with graph-based algorithms for constructing
region proposals that may be used by a neural network for object detection and localization.
Classpath: YOLOLogic => RPN
- SingleInstanceDetector = <class 'YOLOLogic.YOLOLogic.SingleInstanceDetector'>
- This class demonstrates single-instance object detection on the images in the
PurdueDrEvalDataset dataset. Although these image are complex, in the sense
that each image contains multiple clutter objects in addition to random
noise, nonetheless we know that each image contains only a single meaningful
object instance. The LOADnet network used for detection is adaptation of the
the LOADnet2 network from DLStudio to the case of 128x128 sized input images.
The LOADnet network uses the SkipBlock as a building-block element for
dealing the problems caused by vanishing gradients.
- YoloObjectDetector = <class 'YOLOLogic.YOLOLogic.YoloObjectDetector'>
- The primary purpose of this class is to demonstrate multi-instance object detection with YOLO
logic. A key parameter of the logic for YOLO based detection is the variable 'yolo_interval'.
The image gridding that is required is based on the value assigned to this variable. The grid is
represented by an SxS array of cells where S is the image width divided by yolo_interval. So for
images of size 128x128 and 'yolo_interval=20', you will get a 6x6 grid of cells over the image.
Since my goal is merely to illustrate the principles of the YOLO logic, I have not bothered
with the bottom 8 rows and the right-most 8 columns of the image that get left out of the area
covered by such a grid.
An important element of the YOLO logic is defining a set of Anchor Boxes for each cell in the SxS
grid. The anchor boxes are characterized by their aspect ratios. By aspect ratio I mean the
'height/width' characterization of the boxes. My implementation provides for 5 anchor boxes for
each cell with the following aspect ratios: 1/5, 1/3, 1/1, 3/1, 5/1.
At training time, each instance in the image is assigned to that cell whose central pixel is
closest to the center of the bounding box for the instance. After the cell assignment, the
instance is assigned to that anchor box whose aspect ratio comes closest to matching the aspect
ratio of the instance.
The assigning of an object instance to a <cell, anchor_box> pair is encoded in the form of a
'5+C' element long YOLO vector where C is the number of classes for the object instances.
In our cases, C is 3 for the three classes 'Dr_Eval', 'house' and 'watertower', therefore we
end up with an 8-element vector encoding when we assign an instance to a <cell, anchor_box>
pair. The last C elements of the encoding vector can be thought as a one-hot representation
of the class label for the instance.
The first five elements of the vector encoding for each anchor box in a cell are set as follows:
The first element is set to 1 if an object instance was actually assigned to that anchor box.
The next two elements are the (x,y) displacements of the center of the actual bounding box
for the object instance vis-a-vis the center of the cell. These two displacements are expressed
as a fraction of the width and the height of the cell. The next two elements of the YOLO vector
are the actual height and the actual width of the true bounding box for the instance in question
as a multiple of the cell dimension.
The 8-element YOLO vectors are packed into a YOLO tensor of shape (num_cells, num_anch_boxes, 8)
where num_cell is 36 for a 6x6 gridding of an image, num_anch_boxes is 5.
Classpath: YOLOLogic -> YoloObjectDetector
- canvas = None
- drawEnable = 0
- region_mark_coords = {}
- startX = 0
- startY = 0
| |