||Methods defined here:|
- __init__(self, *args, **kwargs)
- Initialize self. See help(type(self)) for accurate signature.
- build_convo_layers(self, configs_for_all_convo_layers)
- Displays the first batch_size number of images in your dataset.
- display_tensor_as_image(self, tensor, title='')
- This method converts the argument tensor into a photo image that you can display
in your terminal screen. It can convert tensors of three different shapes
into images: (3,H,W), (1,H,W), and (H,W), where H, for height, stands for the
number of pixels in the vertical direction and W, for width, for the same
along the horizontal direction. When the first element of the shape is 3,
that means that the tensor represents a color image in which each pixel in
the (H,W) plane has three values for the three color channels. On the other
hand, when the first element is 1, that stands for a tensor that will be
shown as a grayscale image. And when the shape is just (H,W), that is
automatically taken to be for a grayscale image.
- imshow(self, img)
- called by display_tensor_as_image() for displaying the image
- In the code shown below, the call to "ToTensor()" converts the usual int range 0-255 for pixel
values to 0-1.0 float vals and then the call to "Normalize()" changes the range to -1.0-1.0 float
vals. For additional explanation of the call to "tvt.ToTensor()", see Slide 31 of my Week 2
slides at the DL course website. And see Slides 32 and 33 for the syntax
"tvt.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))". In this call, the three numbers in the
first tuple change the means in the three color channels and the three numbers in the second
tuple change the standard deviations according to the formula:
image_channel_val = (image_channel_val - mean) / std
The end result is that the values in the image tensor will be normalized to fall between -1.0
and +1.0. If needed we can do inverse normalization by
image_channel_val = (image_channel_val * std) + mean
- In general, we want to do data augmentation for training:
- Each collection of 'n' otherwise identical layers in a convolutional network is
specified by a string that looks like:
n = num of this type of convo layer
a = number of out_channels [in_channels determined by prev layer]
b,c = kernel for this layer is of size (b,c) [b along height, c along width]
d = stride for convolutions
k = maxpooling over kxk patches with stride of k
- run_code_for_testing(self, net, display_images=False)
- run_code_for_training(self, net, display_images=False)
- save_model(self, model)
- Save the trained model to a disk file
Data descriptors defined here:
- dictionary for instance variables (if defined)
- list of weak references to the object (if defined)
Data and other attributes defined here:
- CustomDataLoading = <class 'DLStudio.DLStudio.CustomDataLoading'>
- This is a testbed for experimenting with a completely grounds-up attempt at
designing a custom data loader. Ordinarily, if the basic format of how the
dataset is stored is similar to one of the datasets that the Torchvision
module knows about, you can go ahead and use that for your own dataset. At
worst, you may need to carry out some light customizations depending on the
number of classes involved, etc.
However, if the underlying dataset is stored in a manner that does not look
like anything in Torchvision, you have no choice but to supply yourself all
of the data loading infrastructure. That is what this inner class of the
DLStudio module is all about.
The custom data loading exercise here is related to a dataset called
PurdueShapes5 that contains 32x32 images of binary shapes belonging to the
following five classes:
The dataset was generated by randomizing the sizes and the orientations
of these five patterns. Since the patterns are rotated with a very simple
non-interpolating transform, just the act of random rotations can introduce
boundary and even interior noise in the patterns.
Each 32x32 image is stored in the dataset as the following list:
[R, G, B, Bbox, Label]
R : is a 1024 element list of the values for the red component
of the color at all the pixels
B : the same as above but for the green component of the color
G : the same as above but for the blue component of the color
Bbox : a list like [x1,y1,x2,y2] that defines the bounding box
for the object in the image
Label : the shape of the object
I serialize the dataset with Python's pickle module and then compress it with
the gzip module.
You will find the following dataset directories in the "data" subdirectory
of Examples in the DLStudio distro:
The number that follows the main name string "PurdueShapes5-" is for the
number of images in the dataset.
You will find the last two datasets, with 20 images each, useful for debugging
your logic for object detection and bounding-box regression.
Class Path: DLStudio -> CustomDataLoading
- DetectAndLocalize = <class 'DLStudio.DLStudio.DetectAndLocalize'>
- The purpose of this inner class is to focus on object detection in images --- as
opposed to image classification. Most people would say that object detection
is a more challenging problem than image classification because, in general,
the former also requires localization. The simplest interpretation of what
is meant by localization is that the code that carries out object detection
must also output a bounding-box rectangle for the object that was detected.
You will find in this inner class some examples of LOADnet classes meant
for solving the object detection and localization problem. The acronym
"LOAD" in "LOADnet" stands for
"LOcalization And Detection"
The different network examples included here are LOADnet1, LOADnet2, and
LOADnet3. For now, only pay attention to LOADnet2 since that's the class I
have worked with the most for the 1.0.7 distribution.
Class Path: DLStudio -> DetectAndLocalize
- ExperimentsWithCIFAR = <class 'DLStudio.DLStudio.ExperimentsWithCIFAR'>
- Class Path: DLStudio -> ExperimentsWithCIFAR
- ExperimentsWithSequential = <class 'DLStudio.DLStudio.ExperimentsWithSequential'>
- Demonstrates how to use the torch.nn.Sequential container class
Class Path: DLStudio -> ExperimentsWithSequential
- Net = <class 'DLStudio.DLStudio.Net'>
- SemanticSegmentation = <class 'DLStudio.DLStudio.SemanticSegmentation'>
- The purpose of this inner class is to be able to use the DLStudio module for
experiments with semantic segmentation. At its simplest level, the
purpose of semantic segmentation is to assign correct labels to the
different objects in a scene, while localizing them at the same time. At
a more sophisticated level, a system that carries out semantic
segmentation should also output a symbolic expression based on the objects
found in the image and their spatial relationships with one another.
The workhorse of this inner class is the mUnet network that is based
on the UNET network that was first proposed by Ronneberger, Fischer and
Brox in the paper "U-Net: Convolutional Networks for Biomedical Image
Segmentation". Their Unet extracts binary masks for the cell pixel blobs
of interest in biomedical images. The output of their Unet can
therefore be treated as a pixel-wise binary classifier at each pixel
position. The mUnet class, on the other hand, is intended for
segmenting out multiple objects simultaneously form an image. [A weaker
reason for "Multi" in the name of the class is that it uses skip
connections not only across the two arms of the "U", but also also along
the arms. The skip connections in the original Unet are only between the
two arms of the U. In mUnet, each object type is assigned a separate
channel in the output of the network.
This version of DLStudio also comes with a new dataset,
PurdueShapes5MultiObject, for experimenting with mUnet. Each image in
this dataset contains a random number of selections from five different
shapes, with the shapes being randomly scaled, oriented, and located in
each image. The five different shapes are: rectangle, triangle, disk,
oval, and star.
Class Path: DLStudio -> SemanticSegmentation
- SkipConnections = <class 'DLStudio.DLStudio.SkipConnections'>
- This educational class is meant for illustrating the concepts related to the
use of skip connections in neural network. It is now well known that deep
networks are difficult to train because of the vanishing gradients problem.
What that means is that as the depth of network increases, the loss gradients
calculated for the early layers become more and more muted, which suppresses
the learning of the parameters in those layers. An important mitigation
strategy for addressing this problem consists of creating a CNN using blocks
with skip connections.
With the code shown in this inner class of the module, you can now experiment
with skip connections in a CNN to see how a deep network with this feature
might improve the classification results. As you will see in the code shown
below, the network that allows you to construct a CNN with skip connections
is named BMEnet. As shown in the script playing_with_skip_connections.py in
the Examples directory of the distribution, you can easily create a CNN with
arbitrary depth just by using the "depth" constructor option for the BMEnet
class. The basic block of the network constructed by BMEnet is called
SkipBlock which, very much like the BasicBlock in ResNet-18, has a couple of
convolutional layers whose output is combined with the input to the block.
Note that the value given to the "depth" constructor option for the
BMEnet class does NOT translate directly into the actual depth of the
CNN. [Again, see the script playing_with_skip_connections.py in the Examples
directory for how to use this option.] The value of "depth" is translated
into how many instances of SkipBlock to use for constructing the CNN.
Class Path: DLStudio -> SkipConnections
- TextClassification = <class 'DLStudio.DLStudio.TextClassification'>
- The purpose of this inner class is to be able to use the DLStudio module for simple
experiments in text classification. Consider, for example, the problem of automatic
classification of variable-length user feedback: you want to create a neural network
that can label an uploaded product review of arbitrary length as positive or negative.
One way to solve this problem is with a recurrent neural network in which you use a
hidden state for characterizing a variable-length product review with a fixed-length
state vector. This inner class allows you to carry out such experiments.
Class Path: DLStudio -> TextClassification
- TextClassificationWithEmbeddings = <class 'DLStudio.DLStudio.TextClassificationWithEmbeddings'>
- The text processing class described previously, TextClassification, was based on
using one-hot vectors for representing the words. The main challenge we faced
with one-hot vectors was that the larger the size of the training dataset, the
larger the size of the vocabulary, and, therefore, the larger the size of the
one-hot vectors. The increase in the size of the one-hot vectors led to a
model with a significantly larger number of learnable parameters --- and, that,
in turn, created a need for a still larger training dataset. Sounds like a classic
example of a vicious circle. In this section, I use the idea of word embeddings
to break out of this vicious circle.
Word embeddings are fixed-sized numerical representations for words that are
learned on the basis of the similarity of word contexts. The original and still
the most famous of these representations are known as the word2vec
embeddings. The embeddings that I use in this section consist of pre-trained
300-element word vectors for 3 million words and phrases as learned from Google
News reports. I access these embeddings through the popular Gensim library.
Class Path: DLStudio -> TextClassificationWithEmbeddings