| |
- builtins.object
-
- DLStudio
class DLStudio(builtins.object) |
|
DLStudio(*args, **kwargs)
|
|
Methods defined here:
- __init__(self, *args, **kwargs)
- Initialize self. See help(type(self)) for accurate signature.
- build_convo_layers(self, configs_for_all_convo_layers)
- build_fc_layers(self)
- check_a_sampling_of_images(self)
- Displays the first batch_size number of images in your dataset.
- display_tensor_as_image(self, tensor, title='')
- This method converts the argument tensor into a photo image that you can display
in your terminal screen. It can convert tensors of three different shapes
into images: (3,H,W), (1,H,W), and (H,W), where H, for height, stands for the
number of pixels in the vertical direction and W, for width, for the same
along the horizontal direction. When the first element of the shape is 3,
that means that the tensor represents a color image in which each pixel in
the (H,W) plane has three values for the three color channels. On the other
hand, when the first element is 1, that stands for a tensor that will be
shown as a grayscale image. And when the shape is just (H,W), that is
automatically taken to be for a grayscale image.
- imshow(self, img)
- called by display_tensor_as_image() for displaying the image
- load_cifar_10_dataset(self)
- In the code shown below, the call to "ToTensor()" converts the usual int range 0-255 for pixel
values to 0-1.0 float vals and then the call to "Normalize()" changes the range to -1.0-1.0 float
vals. For additional explanation of the call to "tvt.ToTensor()", see Slide 31 of my Week 2
slides at the DL course website. And see Slides 32 and 33 for the syntax
"tvt.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))". In this call, the three numbers in the
first tuple change the means in the three color channels and the three numbers in the second
tuple change the standard deviations according to the formula:
image_channel_val = (image_channel_val - mean) / std
The end result is that the values in the image tensor will be normalized to fall between -1.0
and +1.0. If needed we can do inverse normalization by
image_channel_val = (image_channel_val * std) + mean
- load_cifar_10_dataset_with_augmentation(self)
- In general, we want to do data augmentation for training:
- parse_config_string_for_convo_layers(self)
- Each collection of 'n' otherwise identical layers in a convolutional network is
specified by a string that looks like:
"nx[a,b,c,d]-MaxPool(k)"
where
n = num of this type of convo layer
a = number of out_channels [in_channels determined by prev layer]
b,c = kernel for this layer is of size (b,c) [b along height, c along width]
d = stride for convolutions
k = maxpooling over kxk patches with stride of k
Example:
"n1x[a1,b1,c1,d1]-MaxPool(k1) n2x[a2,b2,c2,d2]-MaxPool(k2)"
- run_code_for_testing(self, net, display_images=False)
- run_code_for_training(self, net, display_images=False)
- save_model(self, model)
- Save the trained model to a disk file
Data descriptors defined here:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
Data and other attributes defined here:
- Autoencoder = <class 'DLStudio.DLStudio.Autoencoder'>
- The man reason for the existence of this inner class in DLStudio is for it to serve as the base class for VAE
(Variational Auto-Encoder). That way, the VAE class can focus exclusively on the random-sampling logic
specific to variational encoding while the base class Autoencoder does the convolutional and
transpose-convolutional heavy lifting associated with the usual encoding-decoding of image data.
Class Path: DLStudio -> Autoencoder
- BMEnet = <class 'DLStudio.DLStudio.BMEnet'>
- This educational class is meant for illustrating the concepts related to the
use of skip connections in neural network. It is now well known that deep
networks are difficult to train because of the vanishing gradients problem.
What that means is that as the depth of network increases, the loss gradients
calculated for the early layers become more and more muted, which suppresses
the learning of the parameters in those layers. An important mitigation
strategy for addressing this problem consists of creating a CNN using blocks
with skip connections.
With the code shown in this inner class of the module, you can now experiment with
skip connections in a CNN to see how a deep network with this feature might improve
the classification results. As you will see in the code shown below, the network
that allows you to construct a CNN with skip connections is named BMEnet. As shown
in the script playing_with_skip_connections.py in the Examples directory of the
distribution, you can easily create a CNN with arbitrary depth just by using the
"depth" constructor option for the BMEnet class. The basic block of the network
constructed by BMEnet is called SkipBlock which, very much like the BasicBlock in
ResNet-18, has a couple of convolutional layers whose output is combined with the
input to the block.
Note that the value given to the "depth" constructor option for the BMEnet class
does NOT translate directly into the actual depth of the CNN. [Again, see the script
playing_with_skip_connections.py in the Examples directory for how to use this
option.] The value of "depth" is translated into how many "same input and output
channels" and the "same input and output sizes" instances of SkipBlock to use
between successive instances of downsampling and channel-doubling instances of
SkipBlock.
Class Path: DLStudio -> BMEnet
- CustomDataLoading = <class 'DLStudio.DLStudio.CustomDataLoading'>
- This is a testbed for experimenting with a completely grounds-up attempt at
designing a custom data loader. Ordinarily, if the basic format of how the dataset
is stored is similar to one of the datasets that the Torchvision module knows about,
you can go ahead and use that for your own dataset. At worst, you may need to carry
out some light customizations depending on the number of classes involved, etc.
However, if the underlying dataset is stored in a manner that does not look like
anything in Torchvision, you have no choice but to supply yourself all of the data
loading infrastructure. That is what this inner class of the main DLStudio class
is all about.
The custom data loading exercise here is related to a dataset called PurdueShapes5
that contains 32x32 images of binary shapes belonging to the following five classes:
1. rectangle
2. triangle
3. disk
4. oval
5. star
The dataset was generated by randomizing the sizes and the orientations of these
five patterns. Since the patterns are rotated with a very simple non-interpolating
transform, just the act of random rotations can introduce boundary and even interior
noise in the patterns.
Each 32x32 image is stored in the dataset as the following list:
[R, G, B, Bbox, Label]
where
R : is a 1024 element list of the values for the red component
of the color at all the pixels
B : the same as above but for the green component of the color
G : the same as above but for the blue component of the color
Bbox : a list like [x1,y1,x2,y2] that defines the bounding box
for the object in the image
Label : the shape of the object
I serialize the dataset with Python's pickle module and then compress it with the
gzip module.
You will find the following dataset directories in the "data" subdirectory of
Examples in the DLStudio distro:
PurdueShapes5-10000-train.gz
PurdueShapes5-1000-test.gz
PurdueShapes5-20-train.gz
PurdueShapes5-20-test.gz
The number that follows the main name string "PurdueShapes5-" is for the number of
images in the dataset.
You will find the last two datasets, with 20 images each, useful for debugging your
logic for object detection and bounding-box regression.
Class Path: DLStudio -> CustomDataLoading
- DetectAndLocalize = <class 'DLStudio.DLStudio.DetectAndLocalize'>
- The purpose of this inner class is to focus on object detection in images --- as
opposed to image classification. Most people would say that object detection is a
more challenging problem than image classification because, in general, the former
also requires localization. The simplest interpretation of what is meant by
localization is that the code that carries out object detection must also output a
bounding-box rectangle for the object that was detected.
You will find in this inner class some examples of LOADnet classes meant for solving
the object detection and localization problem. The acronym "LOAD" in "LOADnet"
stands for
"LOcalization And Detection"
The different network examples included here are LOADnet1, LOADnet2, and LOADnet3.
For now, only pay attention to LOADnet2 since that's the class I have worked with
the most for the 1.0.7 distribution.
Class Path: DLStudio -> DetectAndLocalize
- ExperimentsWithCIFAR = <class 'DLStudio.DLStudio.ExperimentsWithCIFAR'>
- Class Path: DLStudio -> ExperimentsWithCIFAR
- ExperimentsWithSequential = <class 'DLStudio.DLStudio.ExperimentsWithSequential'>
- Demonstrates how to use the torch.nn.Sequential container class
Class Path: DLStudio -> ExperimentsWithSequential
- Net = <class 'DLStudio.DLStudio.Net'>
- SemanticSegmentation = <class 'DLStudio.DLStudio.SemanticSegmentation'>
- The purpose of this inner class is to be able to use the DLStudio platform for
experiments with semantic segmentation. At its simplest level, the purpose of
semantic segmentation is to assign correct labels to the different objects in a
scene, while localizing them at the same time. At a more sophisticated level, a
system that carries out semantic segmentation should also output a symbolic
expression based on the objects found in the image and their spatial relationships
with one another.
The workhorse of this inner class is the mUNet network that is based on the UNET
network that was first proposed by Ronneberger, Fischer and Brox in the paper
"U-Net: Convolutional Networks for Biomedical Image Segmentation". Their Unet
extracts binary masks for the cell pixel blobs of interest in biomedical images.
The output of their Unet can therefore be treated as a pixel-wise binary classifier
at each pixel position. The mUnet class, on the other hand, is intended for
segmenting out multiple objects simultaneously form an image. [A weaker reason for
"Multi" in the name of the class is that it uses skip connections not only across
the two arms of the "U", but also also along the arms. The skip connections in the
original Unet are only between the two arms of the U. In mUnet, each object type is
assigned a separate channel in the output of the network.
This version of DLStudio also comes with a new dataset, PurdueShapes5MultiObject,
for experimenting with mUnet. Each image in this dataset contains a random number
of selections from five different shapes, with the shapes being randomly scaled,
oriented, and located in each image. The five different shapes are: rectangle,
triangle, disk, oval, and star.
Class Path: DLStudio -> SemanticSegmentation
- TextClassification = <class 'DLStudio.DLStudio.TextClassification'>
- The purpose of this inner class is to be able to use the DLStudio platform for simple
experiments in text classification. Consider, for example, the problem of automatic
classification of variable-length user feedback: you want to create a neural network
that can label an uploaded product review of arbitrary length as positive or negative.
One way to solve this problem is with a recurrent neural network in which you use a
hidden state for characterizing a variable-length product review with a fixed-length
state vector. This inner class allows you to carry out such experiments.
Class Path: DLStudio -> TextClassification
- TextClassificationWithEmbeddings = <class 'DLStudio.DLStudio.TextClassificationWithEmbeddings'>
- The text processing class described previously, TextClassification, was based on
using one-hot vectors for representing the words. The main challenge we faced
with one-hot vectors was that the larger the size of the training dataset, the
larger the size of the vocabulary, and, therefore, the larger the size of the
one-hot vectors. The increase in the size of the one-hot vectors led to a
model with a significantly larger number of learnable parameters --- and, that,
in turn, created a need for a still larger training dataset. Sounds like a classic
example of a vicious circle. In this section, I use the idea of word embeddings
to break out of this vicious circle.
Word embeddings are fixed-sized numerical representations for words that are
learned on the basis of the similarity of word contexts. The original and still
the most famous of these representations are known as the word2vec
embeddings. The embeddings that I use in this section consist of pre-trained
300-element word vectors for 3 million words and phrases as learned from Google
News reports. I access these embeddings through the popular Gensim library.
Class Path: DLStudio -> TextClassificationWithEmbeddings
- VAE = <class 'DLStudio.DLStudio.VAE'>
- VAE stands for "Variational Auto Encoder". These days, you are more likely to see it
written as "variational autoencoder". I consider VAE as one of the foundational neural
architectures in Deep Learning. VAE is based on the new celebrated 2014 paper
"Auto-Encoding Variational Bayes" by Kingma and Welling. The idea is for the Encoder
part of an Encoder-Decoder pair to learn the probability distribution for the Latent
Space Representation of a training dataset. Described loosely, the latent vector z for
an input image x would be the "essence" of what x is depicting. Presumably, after the
latent distribution has been learned, the Decoder should be able to transform any "noise"
vector sampled from the latent distribution and convert it into the sort of output you
would see during the training process.
In case you are wondering about the dimensionality of the Latent Space, consider the case
that the input images are eventually converted into 8x8 pixel arrays, with each pixel
represented by a 128-dimensional embedding. In a vectorized representation, this implies
an 8192-dimensional space for the Latent Distribution. The mean (mu) and the log-variance
values (logvar) values learned by the Encoder would represent vectors in an 8,192
dimensional space. The Decoder's job would be sample this distribution and attempt a
reconstruction of what the user wants to see at the output of the Decoder.
As you can see, the VAE class is derived from the parent class Autoencoder. Bulk of the
computing in VAE is done through the functionality packed into the Autoencoder class.
Therefore, in order to fully understand the VAE implementation here, your starting point
should be the code for the Autoencoder class.
Class Path: DLStudio -> VAE
| |