DLStudio-2.2.7.html

DLStudio

Version 2.2.7, 2023-April-20

A software platform for teaching the Deep Learning class at Purdue University

DLStudio.py
Version:  2.2.7
Author:  Avinash Kak (kak@purdue.edu)
Date:  2023-April-20

Download Version 2.2.7: gztar	Total number of downloads (all versions) from this website: 6249 This count is automatically updated at every rotation of the weblogs (normally once every two to four days) Last updated: Wed Dec 17 06:03:01 EST 2025
View the main module code file in your browser View the AdversarialLearning code file in your browser View the Seq2SeqLearning code file in your browser View the DataPrediction code file in your browser View the Transformers code file in your browser Download the image datasets for the main DLStudio module Download the image datasets for adversarial learning Download the datasets for text classification Download the dataset for sequence-to-sequence learning Download the dataset for data prediction Download the datasets for transformer based learning

Download Version 2.2.7: gztar

Total number of downloads (all versions) from this website: 6249

            This count is automatically updated at every rotation of
          the weblogs (normally once every two to four days)
          Last updated: Wed Dec 17 06:03:01 EST 2025

View the main module code file in your browser
View the AdversarialLearning code file in your browser
View the Seq2SeqLearning code file in your browser
View the DataPrediction code file in your browser
View the Transformers code file in your browser

Download the image datasets for the main DLStudio module
Download the image datasets for adversarial learning
Download the datasets for text classification
Download the dataset for sequence-to-sequence learning
Download the dataset for data prediction
Download the datasets for transformer based learning

Switch to Version 2.2.8

CONTENTS:

CHANGE LOG
INTRODUCTION
    SKIP CONNECTIONS
    OBJECT DETECTION AND LOCALIZATION
    NOISY OBJECT DETECTION AND LOCALIZATION
    IoU REGRESSION FOR OBJECT DETECTION AND LOCALIZATION
    SEMANTIC SEGMENTATION
    TEXT CLASSIFICATION
    DATA MODELING WITH ADVERSARIAL LEARNING
    SEQUENCE-TO-SEQUENCE LEARNING WITH ATTENTION
    DATA PREDICTION
    TRANSFORMERS
INSTALLATION
USAGE
CONSTRUCTOR PARAMETERS
PUBLIC METHODS
THE MAIN INNER CLASSES OF THE MODULE
CO-CLASSES IN THE DLStudio MODULE
Examples DIRECTORY
ExamplesAdversarialLearning DIRECTORY
ExamplesSeq2SeqLearning DIRECTORY
ExamplesDataPrediction DIRECTORY
ExamplesTransformers DIRECTORY
THE DATASETS INCLUDED
    FOR THE MAIN DLStudio MODULE
    FOR Seq2Seq LEARNING
    FOR ADVERSARIAL LEARNING
    FOR DATA PREDICTION
    FOR TRANSFORMERS
BUGS
ACKNOWLEDGMENTS
ABOUT THE AUTHOR
COPYRIGHT

CHANGE LOG

  Version 2.2.7:

    This version provides you with the tools you need to cope with the
    frustrations of training a transformer based network. Such networks in general
    are difficult to train, in the sense that your per-epoch training time is
    likely to be much longer than what you are accustomed to, and it can take
    many, many more epochs to get the model to converge.  In addition, you have
    the problem of stability to deal with. Stability means that with a wrong
    choice for the hyperparameters, the model that you are training could suddenly
    start diverging (which is something akin to mode collapse in training a GAN).
    If you have to wait until the end of training to see such failures, that can be very
    frustrating.  To cope with these problems, this version of DLStudio
    automatically spits out a checkpoint for the model every 5 epochs and also
    gives you the functions for evaluating the performance of the checkpoints. The
    performance check can be as simple as looking at the translated sentences
    vis-a-vis their targets for a random selection of sentence pairs from the
    data.  When real learning is taking place, you will see longer and longer
    fragments of the translated sentences correspond to the target sentences. On
    the other hand, when you have model divergence, the translated sentences will
    appear to be gibberish.  A future version of DLStudio will also print out the
    BLEU score for the checkpoints.

  Version 2.2.5:

    This version contains significantly improved documentation for DCGAN and WGAN
    in the AdversarialLearning class of DLStudio.

  Version 2.2.4:

    I have cleaned up the code in the new DIoULoss class that I added in the
    previous version. The script object_detection_and_localization_iou.py in the
    Examples directory of DLStudio is based on this loss function.

  Version 2.2.3:

    The inner class DetectAndLocalize of DLStudio now contains a custom loss
    function provided by the class DIoULoss that implements the more modern
    variants of the basic IoU (Intersection over Union) loss function.  These IoU
    variants are explained in the slides 37-42 of my Week 7 Lecture on "Object
    Detection and Localization."  Your best entry point to become familiar with
    these loss functions is the script object_detection_and_localization_iou.py in
    the Examples directory of DLStudio.

  Version 2.2.2:

    This version of DLStudio presents my implementation of transformers in deep
    learning. You will find two transformer implementations in the Transformers
    co-class of DLStudio in the distribution directory: TransformerFG and
    TransformerPreLN.  "FG" in TransformerFG stands for "Transformer First
    Generation"; it is my implementation of the architecture presented originally
    in the seminal paper "Attention is All You Need" by Vaswani et el.  And the
    second, TransformerPreLN ("PreLN" stands for "Pre Layer Norm") is a small but
    important modification of the original idea that is based on the paper "On
    Layer Normalization in the Transformer Architecture" by Xiong et al.  I could
    have easily combined the two implementations with a small number of
    conditional statements to account for the differences, however I have chosen
    to keep them separate in order to make it easier for the two to evolve
    separately and to be used differently for educational purposes.

  Versions 2.1.7 through 2.2.1:

    These version numbers are for the stepping-stones in my journey into the world
    of transformers --- my experiments with how to best implement the different
    components of a transformer for educational purposes.  As things stand, these
    versions contain features that did not make into the public release version
    2.2.2 on account of inadequate testing.  I may include those features in
    versions of DLStudio after 2.2.2.

  Version 2.1.6:

    All the changes are confined to the DataPrediction co-class of the DLStudio
    module.  After posting the previous version, I noticed that the quality of the
    code in DataPrediction was not up to par.  The new version presents a
    cleaned-up version of the DataPrediction class.

  Version 2.1.5:

    DLStudio has now been equipped with a new co-class named DataPrediction whose
    focus is solely on solving data prediction problems for time-series data.  A
    time-series consists of a sequence of observations recorded at regular
    intervals.  These could, for example, be the price of a stock share recorded
    every hour; the hourly recordings of electrical load at your local power
    utility company; the mean average temperature recorded on an annual basis; and
    so on.  We want to use the past observations to predict the value of the next
    one.  While data prediction has much in common with other forms of sequence
    based learning, it presents certain unique challenges of its own and those are
    with respect to (1) Data Normalization; (2) Input Data Chunking; and (3)
    Multi-dimensional encoding of the "datetime" associated with each observation
    in the time-series.

  Version 2.1.3:

    Some users of DLStudio have reported that when they run the WGAN code for
    adversarial learning, the dataloader sometimes hangs in the middle of a
    training run.  (A WGAN training session may involve as many as 500 epochs.)
    In trying to reproduce this issue, I discovered that the training loops always
    ran to completion if you set the number of workers in the dataloader to 0.
    Version 2.1.3 makes it easier for you to specify the number of workers in your
    own scripts that call on the WGAN functionality in the AdversarialLearning
    class.

  Version 2.1.2:

    The adversarial learning part of DLStudio now includes a WGAN implementation
    that uses Gradient Penalty for the learning required by the Critic.  All the
    changes made are in the AdversarialLearning class at the top level of the
    module.

  Version 2.1.1:

    In order to make it easier to navigate through the large code base of the
    module, I am adopting the convention that "Network" in the name of a class be
    reserved for only those cases when a class actually implements a network.
    This convention requires that the name of an encapsulating class meant for
    teaching/learning a certain aspect of deep learning not contain "Network" in
    it.  Therefore, in Version 2.1.1, I have changed the names of the top-level
    classes AdversarialNetworks and Seq2SeqNetworks to AdversarialLearning and
    Seq2SeqLearning, respectively.

  Version 2.1.0:

    I have reorganized the code base a bit to make it easier for DLStudio to grow
    in the future.  This I did by moving the sequence-to-sequence learning
    (seq2seq) code to a separate co-class of the main DLStudio class.  The name of
    the new class is Seq2SeqLearning and it resides at the top level of the
    distribution.

  Version 2.0.9:

    With this version, DLStudio comes with educational material on
    sequence-to-sequence learning (seq2seq). To that end, I have included the
    following two new classes in DLStudio: (1) Seq2SeqWithLearnableEmbeddings for
    seq2seq with learnable embeddings; and (2) Seq2SeqWithPretrainedEmbeddings for
    doing the same with pre-trained embeddings. Although I have used word2vec for
    the case of pre-trained embeddings, you would be able to run the code with the
    Fasttext embeddings also.  Both seq2seq implementations include the attention
    mechanism based on my understanding of the original paper on the subject by
    Bahdanau, Cho, and Bengio. You will find this code in a class named
    Attention_BCB.  For the sake of comparison, I have also included an
    implementation of the the attention mechanism used in the very popular NLP
    tutorial by Sean Robertson.  You will find that code in a class named
    Attention_SR. To switch between these two attention mechanisms, all you have
    to do is to comment-out and uncomment a couple of lines in the DecoderRNN
    code.

  Version 2.0.8:

    This version pulls into DLStudio a very important idea in text processing and
    language modeling --- word embeddings.  That is, representing words by
    fixed-sized numerical vectors that are learned on the basis of their
    contextual similarities (meaning that if two words occur frequently in each
    other's context, they should have similar numerical representations).  Use of
    word embeddings is demonstrated in DLStudio through an inner class named
    TextClassificationWithEmbeddings.  Using pre-trained word2vec embeddings, this
    new inner class can be used for experimenting with text classification,
    sentiment analysis, etc.

  Version 2.0.7:

    Made incremental improvements to the visualization of intermediate results
    during training.

  Version 2.0.6:

    This is a result of further clean-up of the code base in DLStudio.  The basic
    functionality provided by the module has not changed.

  Version 2.0.5:

    This version has a bug-fix for the training loop used for demonstrating the
    power of skip connections.  I have also cleaned up how the intermediate
    results produced during training are displayed in your terminal window.  In
    addition, I deleted the part of DLStudio that dealt with Autograd
    customization since that material is now in my ComputationalGraphPrimer
    module.

  Version 2.0.4:

    This version mostly changes the HTML formatting of this documentation page.
    The code has not changed.

  Version 2.0.3:

    I have been experimenting with how to best incorporate adversarial learning in
    the DLStudio module. That's what accounts for the jump from the previous
    public release version 1.1.4 to new version 2.0.3.  The latest version comes
    with a separate class named AdversarialLearning for experimenting with
    different types of such networks for learning data models with adversarial
    learning and, subsequently, generating new instances of the data from the
    learned models. The AdversarialLearning class includes two
    Discriminator-Generator (DG) pairs and one Critic-Generator (CG) pair. Of the
    two DG pairs, the first is based on the logic of DCGAN, and the second a small
    modification of the first.  The CG pair is based on the logic of Wasserstein
    GAN.  This version of the module also comes with a new examples directory,
    ExamplesAdversarialLearning, that contains example scripts that show how you
    can call the different DG and CG pairs in the AdversarialLearning class.  Also
    included is a new dataset I have created, PurdueShapes5GAN-20000, that
    contains 20,000 images of size 64x64 for experimenting with the GANs in this
    module.

  Version 1.1.4:

    This version has a new design for the text classification class TEXTnetOrder2.
    This has entailed new scripts for training and testing when using the new
    version of that class. Also includes a fix for a bug discovered in Version
    1.1.3

  Version 1.1.3:

    The only change made in this version is to the class GRUnet that is used for
    text classification.  In the new version, the final output of this network is
    based on the LogSoftmax activation.

  Version 1.1.2:

    This version adds code to the module for experimenting with recurrent neural
    networks (RNN) for classifying variable-length text input. With an RNN, a
    variable-length text input can be characterized with a hidden state vector of
    a fixed size.  The text processing capabilities of the module allow you to
    compare the results that you may obtain with and without using a GRU. For such
    experiments, this version also comes with a text dataset based on an old
    archive of product reviews made available by Amazon.

  Version 1.1.1:

    This version fixes the buggy behavior of the module when using the 'depth'
    parameter to change the size of a network.

  Version 1.1.0:

    The main reason for this version was my observation that when the training
    data is intentionally corrupted with a high level of noise, it is possible for
    the output of regression to be a NaN (Not a Number).  In my testing at noise
    levels of 20%, 50%, and 80%, while you do not see this problem when the noise
    level is 20%, it definitely becomes a problem when the noise level is at 50%.
    To deal with this issue, this version includes the test 'torch.isnan()' in the
    training and testing code for object detection.  This version of the module
    also provides additional datasets with noise corrupted images with different
    levels of noise.  However, since the total size of the datasets now exceeds
    the file-size limit at 'https://pypi.org', you'll need to download them
    separately from the link provided in the main documentation page.

  Version 1.0.9:

    With this version, you can now use DLStudio for experiments in semantic
    segmentation of images.  The code added to the module is in a new inner class
    that, as you might guess, is named SemanticSegmentation.  The workhorse of
    this inner class is a new implementation of the famous Unet that I have named
    mUnet --- the prefix "m" stands for "multi" for the ability of the network to
    segment out multiple objects simultaneously.  This version of DLStudio also
    comes with a new dataset, PurdueShapes5MultiObject, for experimenting with
    mUnet.  Each image in this dataset contains a random number of selections from
    five different shapes --- rectangle, triangle, disk, oval, and star --- that
    are randomly scaled, oriented, and located in each image.

  Version 1.0.7:

    The main reason for creating this version of DLStudio is to be able to use the
    module for illustrating how to simultaneously carry out classification and
    regression (C&R) with the same convolutional network.  The specific C&R
    problem that is solved in this version is the problem of object detection and
    localization. You want a CNN to categorize the object in an image and, at the
    same time, estimate the bounding-box for the detected object. Estimating the
    bounding-box is referred to as regression.  All of the code related to object
    detection and localization is in the inner class DetectAndLocalize of the main
    module file.  Training a CNN to solve the detection and localization problem
    requires a dataset that, in addition to the class labels for the objects, also
    provides bounding-box annotations for the objects.  Towards that end, this
    version also comes with a new dataset called PurdueShapes5.  Another new inner
    class, CustomDataLoading, that is also included in Version 1.0.7 has the
    dataloader for the PurdueShapes5 dataset.

  Version 1.0.6:

    This version has the bugfix for a bug in SkipBlock that was spotted by a
    student as I was demonstrating in class the concepts related to the use of
    skip connections in deep neural networks.

  Version 1.0.5:

    This version includes an inner class, SkipConnections, for experimenting with
    skip connections to improve the performance of a deep network.  The Examples
    subdirectory of the distribution includes a script,
    playing_with_skip_connections.py, that demonstrates how you can experiment
    with SkipConnections.  The network class used by SkipConnections is named
    BMEnet with an easy-to-use interface for experimenting with networks of
    arbitrary depth.

  Version 1.0.4:

    I have added one more inner class, AutogradCustomization, to the module that
    illustrates how to extend Autograd if you want to endow it with additional
    functionality. And, most importantly, this version fixes an important bug that
    caused wrong information to be written out to the disk when you tried to save
    the learned model at the end of a training session. I have also cleaned up the
    comment blocks in the implementation code.

  Version 1.0.3:

    This is the first public release version of this module.

INTRODUCTION

    DLStudio is an integrated software platform for teaching (and learning) a wide
    range of basic architectural features of deep-learning neural networks.

    Most instructors who teach deep learning ask their students to download the
    so-called famous networks from, say, GitHub and become familiar with them by
    running them on the datasets used by the authors of those networks.  This
    approach is akin to teaching automobile engineering by asking the students to
    take the high-powered cars of the day out for a test drive.  In my opinion,
    this rather commonly used approach does not work for instilling in the
    students a deep understanding of the issues related to network architectures.

    On the other hand, DLStudio offers its own implementations for a variety of
    key features of neural network architectures.  These implementations, along
    with their explanations through detailed slide presentations at our Deep
    Learning class website at Purdue, result in an educational framework that is
    much more efficient in what it can deliver within the time constraints of a
    single semester.

    DLStudio facilitates learning through a combination of inner classes of the
    main module class --- called DLStudio naturally --- and several co-classes of
    the main class that deal with adversarial learning, sequence-to-sequence
    learning, data prediction, text analysis, and transformers.

    For the most part, the common code that you'd need in different scenarios for
    using neural networks has been placed inside the definition of the main
    DLStudio class in a file named DLStudio.py in the distribution.  That makes
    more compact the definition of the other inner classes within DLStudio. And,
    to a certain extent, that also results in a bit more compact code in the
    co-classes of DLStudio.

   SKIP CONNECTIONS

    Starting with Version 1.0.6, you can now experiment with skip connections in a
    CNN to see how a deep network with this feature might yield improved
    classification results.  Deep networks suffer from the problem of vanishing
    gradients that degrades their performance.  Vanishing gradients means that the
    gradients of the loss calculated in the early layers of a network become
    increasingly muted as the network becomes deeper.  An important mitigation
    strategy for addressing this problem consists of creating a CNN using blocks
    with skip connections.

    The code for using skip connections is in the inner class SkipConnections of
    the module.  And the network that allows you to construct a CNN with skip
    connections is named BMEnet.  As shown in the script
    playing_with_skip_connections.py in the Examples directory of the
    distribution, you can easily create a CNN with arbitrary depth just by using
    the constructor option "depth" for BMEnet. The basic block of the network
    constructed in this manner is called SkipBlock which, very much like the
    BasicBlock in ResNet-18, has a couple of convolutional layers whose output is
    combined with the input to the block.

    Note that the value given to the "depth" constructor option for the BMEnet
    class does NOT translate directly into the actual depth of the CNN. [Again,
    see the script playing_with_skip_connections.py in the Examples directory for
    how to use this option.] The value of "depth" is translated into how many
    instances of SkipBlock to use for constructing the CNN.

    If you want to use DLStudio for learning how to create your own versions of
    SkipBlock-like shortcuts in a CNN, your starting point should be the following
    script in the Examples directory of the distro:

                playing_with_skip_connections.py

    This script illustrates how to use the inner class BMEnet of the module for
    experimenting with skip connections in a CNN. As the script shows, the
    constructor of the BMEnet class comes with two options: skip_connections and
    depth.  By turning the first on and off, you can directly illustrate in a
    classroom setting the improvement you can get with skip connections.  And by
    giving an appropriate value to the "depth" option, you can show results for
    networks of different depths.

   OBJECT DETECTION AND LOCALIZATION

    The code for how to solve the problem of object detection and localization
    with a CNN is in the inner classes DetectAndLocalize and CustomDataLoading.
    This code was developed for version 1.0.7 of the module.  In general, object
    detection and localization problems are more challenging than pure
    classification problems because solving the localization part requires
    regression for the coordinates of the bounding box that localize the object.
    If at all possible, you would want the same CNN to provide answers to both the
    classification and the regression questions and do so at the same time.  This
    calls for a CNN to possess two different output layers, one for classification
    and the other for regression.  A deep network that does exactly that is
    illustrated by the LOADnet classes that are defined in the inner class
    DetectAndLocalize of the DLStudio module.  [By the way, the acronym "LOAD" in
    "LOADnet" stands for "LOcalization And Detection".] Although you will find
    three versions of the LOADnet class inside DetectAndLocalize, for now only pay
    attention to the LOADnet2 class since that is the one I have worked with the
    most for creating the 1.0.7 distribution.

    As you would expect, training a CNN for object detection and localization
    requires a dataset that, in addition to the class labels for the images, also
    provides bounding-box annotations for the objects in the images. Out of my
    great admiration for the CIFAR-10 dataset as an educational tool for solving
    classification problems, I have created small-image-format training and
    testing datasets for illustrating the code devoted to object detection and
    localization in this module.  The training dataset is named
    PurdueShapes5-10000-train.gz and it consists of 10,000 images, with each image
    of size 32x32 containing one of five possible shapes --- rectangle, triangle,
    disk, oval, and star. The shape objects in the images are randomized with
    respect to size, orientation, and color.  The testing dataset is named
    PurdueShapes5-1000-test.gz and it contains 1000 images generated by the same
    randomization process as used for the training dataset.  You will find these
    datasets in the "data" subdirectory of the "Examples" directory in the
    distribution.

    Providing a new dataset for experiments with detection and localization meant
    that I also needed to supply a custom dataloader for the dataset.  Toward that
    end, Version 1.0.7 also includes another inner class named CustomDataLoading
    where you will my implementation of the custom dataloader for the
    PurdueShapes5 dataset.

    If you want to use DLStudio for learning how to write your own PyTorch code
    for object detection and localization, your starting point should be the
    following script in the Examples directory of the distro:

                object_detection_and_localization.py

    Execute the script and understand what functionality of the inner class
    DetectAndLocalize it invokes for object detection and localization.

   NOISY OBJECT DETECTION AND LOCALIZATION

    When the training data is intentionally corrupted with a high level of noise,
    it is possible for the output of regression to be a NaN (Not a Number).  Here
    is what I observed when I tested the LOADnet2 network at noise levels of 20%,
    50%, and 80%: At 20% noise, both the labeling and the regression accuracies
    become worse compared to the noiseless case, but they would still be usable
    depending on the application.  For example, with two epochs of training, the
    overall classification accuracy decreases from 91% to 83% and the regression
    error increases from under a pixel (on the average) to around 3 pixels.
    However, when the level of noise is increased to 50%, the regression output is
    often a NaN (Not a Number), as presented by 'numpy.nan' or 'torch.nan'.  To
    deal with this problem, Version 1.1.0 of the DLStudio module checks the output
    of the bounding-box regression before drawing the rectangles on the images.

    If you wish to experiment with detection and localization in the presence
    of noise, your starting point should be the script

                noisy_object_detection_and_localization.py

    in the Examples directory of the distribution.  Note that you would need to
    download the datasets for such experiments directly from the link provided
    near the top of this documentation page.

   IoU REGRESSION FOR OBJECT DETECTION AND LOCALIZATION

    Starting with version 2.2.3, DLStudio illustrates how you can use modern
    variants of the IoU (Intersection over Union) loss function for the regression
    needed for object localization.  These loss functions are provided by the
    DIoULoss class that is a part of DLStudio's inner class DetectAndLocalize. If
    you wish to experiment with these loss functions, you best entry point would
    be the script

                object_detection_and_localization_iou.py

    in the Examples directory of the distribution.  This script uses the same
    PurdueShapes5-10000-train.gz and PurdueShapes5-1000-test.gz training and
    testing datasets as the object_detection_and_localization.py script mentioned
    earlier.

   SEMANTIC SEGMENTATION

    The code for how to carry out semantic segmentation is in the inner class that
    is appropriately named SemanticSegmentation.  At its simplest, the purpose of
    semantic segmentation is to assign correct labels to the different objects in
    a scene, while localizing them at the same time.  At a more sophisticated
    level, a system that carries out semantic segmentation should also output a
    symbolic expression that reflects an understanding of the scene in the image
    that is based on the objects found in the image and their spatial
    relationships with one another.  The code in the new inner class is based on
    only the simplest possible definition of what is meant by semantic
    segmentation.

    The convolutional network that carries out semantic segmentation DLStudio is
    named mUnet, where the letter "m" is short for "multi", which, in turn, stands
    for the fact that mUnet is capable of segmenting out multiple object
    simultaneously from an image.  The mUnet network is based on the now famous
    Unet network that was first proposed by Ronneberger, Fischer and Brox in the
    paper "U-Net: Convolutional Networks for Biomedical Image Segmentation".
    Their UNET extracts binary masks for the cell pixel blobs of interest in
    biomedical images.  The output of UNET can therefore be treated as a
    pixel-wise binary classifier at each pixel position.  The mUnet class, on the
    other hand, is intended for segmenting out multiple objects simultaneously
    form an image. [A weaker reason for "m" in the name of the class is that it
    uses skip connections in multiple ways --- such connections are used not only
    across the two arms of the "U", but also also along the arms.  The skip
    connections in the original Unet are only between the two arms of the U.

    mUnet works by assigning a separate channel in the output of the network to
    each different object type.  After the network is trained, for a given input
    image, all you have to do is examine the different channels of the output for
    the presence or the absence of the objects corresponding to the channel index.

    This version of DLStudio also comes with a new dataset,
    PurdueShapes5MultiObject, for experimenting with mUnet.  Each image in this
    dataset contains a random number of selections from five different shapes,
    with the shapes being randomly scaled, oriented, and located in each image.
    The five different shapes are: rectangle, triangle, disk, oval, and star.

    Your starting point for learning how to use the mUnet network for segmenting
    images should be the following script in the Examples directory of the distro:

                semantic_segmentation.py

    Execute the script and understand how it uses the functionality packed in the
    inner class SemanticSegmentation for segmenting out the objects in an image.

   TEXT CLASSIFICATION

    Starting with Version 1.1.2, the module includes an inner class
    TextClassification that allows you to do simple experiments with neural
    networks with feedback (that are also called Recurrent Neural Networks).  With
    an RNN, textual data of arbitrary length can be characterized with a hidden
    state vector of a fixed size.  To facilitate text based experiments, this
    module also comes with text datasets derived from an old Amazon archive of
    product reviews.  Further information regarding the datasets is in the comment
    block associated with the class SentimentAnalysisDataset. If you want to use
    DLStudio for experimenting with text, your starting points should be the
    following three scripts in the Examples directory of the distribution:

                text_classification_with_TEXTnet.py
                text_classification_with_TEXTnetOrder2.py
                text_classification_with_GRU.py

    The first of these is meant to be used with the TEXTnet network that does not
    include any protection against the vanishing gradients problem that a poorly
    designed RNN can suffer from.  The second script mentioned above is based on
    the TEXTnetOrder2 network and it includes rudimentary protection, but not
    enough to suffice for any practical application.  The purpose of TEXTnetOrder2
    is to serve as an educational stepping stone to a GRU (Gated Recurrent Unit)
    network that is used in the third script listed above.

    Starting with Version 2.0.8, the Examples directory of DLStudio also includes
    the following three scripts that use the same learning networks as the
    corresponding scripts mentioned above but with word representations based on
    word2vec embeddings:

                text_classification_with_TEXTnet_word2vec.py
                text_classification_with_TEXTnetOrder2_word2vec.py
                text_classification_with_GRU_word2vec.py

    The pre-trained word2vec embeddings used in these scripts are accessed
    through the popular gensim library.

   DATA MODELING WITH ADVERSARIAL LEARNING

    Starting with version 2.0.3, DLStudio includes a separate class named
    AdversarialLearning for experimenting with different adversarial learning
    approaches for data modeling.  Adversarial Learning consists of simultaneously
    training a Generator and a Discriminator (or, a Generator and a Critic) with
    the goal of getting the Generator to produce from pure noise images that look
    like those in the training dataset.  When Generator-Discriminator pairs are
    used, the Discriminator's job is to become an expert at recognizing the
    training images so it can let us know when the generator produces an image
    that does not look like what is in the training dataset.  The output of the
    Discriminator consists of the probability that the input to the discriminator
    is like one of the training images.

    On the other hand, when a Generator-Critic pair is used, the Critic's job is
    to become adept at estimating the distance between the distribution that
    corresponds to the training dataset and the distribution that has been learned
    by the Generator so far.  If the distance between the distributions is
    differentiable with respect to the weights in the networks, one can backprop
    the distance and update the weights in an iterative training loop.  This is
    roughly the idea of the Wasserstein GAN that is incorporated as a
    Critic-Generator pair CG1 in the AdversarialLearning class.

    The AdversarialLearning class includes two kinds of adversarial networks for
    data modeling: DCGAN and WGAN.

    DCGAN is short for "Deep Convolutional Generative Adversarial Network", owes
    its origins to the paper "Unsupervised Representation Learning with Deep
    Convolutional Generative Adversarial Networks" by Radford et al.  DCGAN was
    the first fully convolutional network for GANs (Generative Adversarial
    Network). CNN's typically have a fully-connected layer (an instance of
    nn.Linear) at the topmost level.  For the topmost layer in the Generator
    network, DCGAN uses another convolution layer that produces the final output
    image.  And for the topmost layer of the Discriminator, DCGAN flattens the
    output and feeds that into a sigmoid function for producing scalar value.
    Additionally, DCGAN also gets rid of max-pooling for downsampling and instead
    uses convolutions with strides.  Yet another feature of a DCGAN is the use of
    batch normalization in all layers, except in the output layer of the Generator
    and the input layer of the Discriminator.  As the authors of DCGAN stated,
    while, in general, batch normalization stabilizes learning by normalizing the
    input to each layer to have zero mean and unit variance, applying BN at the
    output results in sample oscillation and model instability.  I have also
    retained in the DCGAN code the leaky ReLU activation recommended by the
    authors for the Discriminator.

    The other adversarial learning framework incorporated in AdversarialLearning
    is based on WGAN, which stands for Wasserstein GAN.  This GAN was proposed in
    the paper "Wasserstein GAN" by Arjovsky, Chintala, and Bottou.  WGANs is based
    on estimating the Wasserstein distance between the distribution that
    corresponds to the training images and the distribution that has been learned
    so far by the Generator.  The authors of WGAN have shown that minimizing this
    distance in an iterative learning framework also involves solving a minimax
    problem involving a Critic and a Generator. The Critic's job is to become an
    expert at recognizing the training data while, at the same time, distrusting
    the output of the Generator. Unlike the Discriminator of a GAN, the Critic
    merely seeks to estimate the Wasserstein distance between the true
    distribution associated with the training data and the distribution being
    learned by the Generator.  As the Generator parameters are kept fixed, the
    Critic seems to update its parameters that maximize the Wasserstein distance
    between the true and the fake distributions. Subsequently, as the Critic
    parameters are kept fixed, the Generator updates its learnable parameters in
    an attempt to minimize the same distance.

    Estimation of the Wasserstein distance in the above logic requires for the
    Critic to learn a 1-Lipschitz function. DLStudio implements the following two
    strategies for this learning:

        --  Clipping the values of the learnable parameters of the Critic network
            to a user-specified interval;

        --  Penalizing the gradient of the norm of the Critic with respect to its
            input.

    The first of these is implemented in the function "run_gan_code()" in the file
    AdversarialLearning.py and the second in the function
    "run_wgan_with_gp_code()" in the same file.

    If you wish to use the DLStudio module to learn about data modeling with
    adversarial learning, your entry points should be the following scripts in the
    ExamplesAdversarialLearning directory of the distro:

        1.  dcgan_DG1.py

        2.  dcgan_DG2.py

        3.  wgan_CG1.py

        4.  wgan_with_gp_CG2.py

    The first script demonstrates the DCGAN logic on the PurdueShapes5GAN dataset.
    In order to show the sensitivity of the basic DCGAN logic to any variations in
    the network or the weight initializations, the second script introduces a
    small change in the network.  The third script is a demonstration of using the
    Wasserstein distance for data modeling through adversarial learning. The
    fourth script includes a gradient penalty in the critic logic called on by the
    third script.  The results produced by these scripts (for the constructor
    options shown in the scripts) are included in a subdirectory named
    RVLCloud_based_results.

   SEQUENCE-TO-SEQUENCE LEARNING WITH ATTENTION

    Sequence-to-sequence learning (seq2seq) is about predicting an outcome
    sequence from a causation sequence, or, said another way, a target sequence
    from a source sequence.  Automatic machine translation is probably one of the
    most popular applications of seq2seq.  DLStudio uses English-to-Spanish
    translation to illustrate the programming idioms and the PyTorch structures
    you need for seq2seq.  To that end, Version 2.1.0 of DLStudio includes a
    co-class (meaning a class that resides at the top level in the distribution)
    named Seq2SeqLearning that consists of the following two demonstration
    classes:

        1.  Seq2SeqWithLearnableEmbeddings

        2.  Seq2SeqWithPretrainedEmbeddings

    As their names imply, the first is for seq2seq with learnable embeddings and
    the second for seq2seq with pre-trained embeddings like word2vec or fasttext.

    As mentioned above, the specific example of seq2seq addressed in my
    implementation code is translation from English to Spanish. (I chose this
    example because learning and keeping up with Spanish is one of my hobbies.)
    In the Seq2SeqWithLearnableEmbeddings class, the learning framework learns the
    best embedding vectors to use for the two languages involved. On the other
    hand, in the Seq2SeqWithPretrainedEmbeddings class, I use the word2vec
    embeddings provided by Google for the source language.  As to why I use the
    pre-training embeddings for just the source language is explained in the main
    comment doc associated with the class Seq2SeqWithPretrainedEmbeddings.

    Any modern attempt at seq2seq must include attention.  This is done by
    incorporating a separate Attention network in the Encoder-Decoder framework
    needed for seq2seq learning.  The goal of the attention network is to modify
    the current hidden state in the decoder using the attention units produced
    previously by the encoder for the source language sentence.  The main
    Attention model I have used is based on my understanding of the attention
    mechanism proposed by Bahdanau, Cho, and Bengio. You will see this attention
    code in a class named Attention_BCB in the seq2seq implementations named
    above. I have also provided another attention class named Attention_SR that is
    my implementation of the attention mechanism in the very popular NLP tutorial
    by Sean Robertson at the PyTorch website.  The URLs to both these attention
    mechanisms are in my Week 14 lecture material on deep learning at Purdue.

    The following two scripts in the ExamplesSeq2SeqLearning directory are your
    main entry points for experimenting with the seq2seq code in DLStudio:

        1.  seq2seq_with_learnable_embeddings.py

        2.  seq2seq_with_pretrained_embeddings.py

    With the first script, the overall network will learn on its own the best
    embeddings to use for representing the words in the two languages.  And, with
    the second script, the pre-trained word2vec embeddings from Google are used
    for the source language while the system learns the embeddings for the target
    language.

   DATA PREDICTION

    Let's say you have a sequence of observations recorded at regular intervals.
    These could, for example, be the price of a stock share recorded every hour;
    the hourly recordings of electrical load at your local power utility company;
    the mean average temperature recorded on an annual basis; and so on.  We want
    to use the past observations to predict the value of the next one.  Solving
    these types of problems is the focus of the DataPrediction co-class of
    DLStudio.

    As a problem, data prediction has much in common with text analytics and
    seq2seq processing, in the sense that the prediction at the next time instant
    must be based on the previous observations in a manner similar to what we do
    in text analytics where the next word is understood taking into account all
    the previous words in a sentence.  However, there are three significant
    differences between purely numerical data prediction problems and text-based
    problems:

    1) Data Normalization: As you know by this time, neural networks require that
       your input data be normalized to the [0,1] interval, assuming it consists
       of non-negative numbers, or the [-1,1] interval otherwise.  When solving a
       sequential-data problem like text analytics, after you have normalized the
       input data (which is likely to consist of the numeric embeddings for the
       input words), you can forget about it.  You don't have that luxury when
       solving a data prediction problem.  As you would expect, the next value
       predicted by an algorithm must be at the same scale as the original input
       data.  This requires that the output of a neural-network-based prediction
       algorithm must be "inverse normalized".  And that, in turn, requires
       remembering the normalization parameters used in each channel of the input
       data.

    2) Input Data Chunking: The notion of a sentence that is important in text
       analytics does not carry over to the data prediction problem.  In general,
       you would want a prediction to be made using ALL of the past
       observations. When the sequential data available for training a predictor
       is arbitrarily long, as is the case with numerical data in general, you
       would need to decide how to "chunk" the data --- that is, how to extract
       sub-sequences from the data for the purpose of training a neural network.

    3) Datetime Conditioning: Time-series data typically includes a "datetime"
       stamp for each observation.  Representing datetime as a one-dimensional
       ever-increasing time value does not work for data prediction if the
       observations depend on the time of the day, the day of the week, the season
       of the year, and other such temporal effects.  Incorporating such effects
       in a prediction framework requires a multi-dimensional encoding of the
       datetime values.  See the doc page associated with the DataPrediction class
       for a longer explanation of this aspect of data prediction.

    Now that you understand how the data prediction problem differs from, say, the
    problem of text analytics, it is time for me to state my main goal in defining
    the DataPrediction class in the DLStudio module.  I actually have two goals:

    (a) To deepen your understanding of a GRU.  At this point, your understanding
        of a GRU is likely to be based on calling PyTorch's GRU in your own code.
        Using a pre-programmed implementation for a GRU makes life easy and you
        also get a piece of highly optimized code that you can just call in your
        own code.  However, with a pre-programmed GRU, you are unlikely to get
        insights into how such an RNN is actually implemented.

    (b) To demonstrate how you can use a Recurrent Neural Network (RNN) for data
        prediction taking into account the data normalization, chunking, and
        datetime conditioning issues mentioned earlier.

    To address the first goal above, the DataPrediction class presented in this
    file is based on my pmGRU (Poor Man's GRU).  This GRU is my implementation of
    the "Minimal Gated Unit" GRU variant that was first presented by Joel Heck and
    Fathi Salem in their paper "Simplified Minimal Gated Unit Variations for
    Recurrent Neural Networks".  Its hallmark is that it combines the Update and
    the Reset gates of a regular GRU into a single gate called the Forget Gate.
    You could say that pmGRU is a lightweight version of a regular GRU and its use
    may therefore lead to a slight loss of accuracy in the predictions.  You will
    find it educational to compare the performance you get with my pmGRU-based
    implementation with an implementation that uses PyTorch's GRU for the same
    dataset.

    Your main entry point for experimenting with the DataPrediction co-class is
    the following script in the ExamplesDataPrediction directory of the DLStudio
    distribution:

        power_load_prediction_with_pmGRU.py

    Before you can run this script, you would need to download the training
    dataset used in this example.  See the "For Data Prediction" part of the "The
    Datasets Included" section of the doc page for that.

   TRANSFORMERS

    The goal of Transformer based learning is the same that of Seq2SeqLearning
    described earlier in this Introduction except that now you completely forgo
    recurrence. That is, you only use the mechanism of attention to translate
    sentences from a source language into sentences in the target language. For
    such applications, you need two forms of attention: self-attention and
    cross-attention.  Self-attention refers to the intra-sentence relationships
    between the words and cross-attention refers to the inter-sentence
    relationships between the words in a pair of sentences, one in the source
    language and the other in the target language. I have explained these concepts
    in great detail in the doc sections of the inner classes in the Transformers
    class.  In particular, I have explained the concept of the "dot-product"
    attention in which each word puts out three things: a Query Vector Q, a Key
    Vector K, and a Value Vector V. By taking the dot-product of the Query Vector
    Q of a word with the Key Vector K for all the words in a sentence, the neural
    network gets a measure of the extent to which each word in a sentence is
    important to every other word.  These dot-product values are then used as
    weights on the Value Vectors, V, for the individual words.  Cross attention
    works in a similar manner, except that now you take the dot-products of the Q
    vectors in the target-language sentence with the K vectors in the
    corresponding source-language sentence for producing the weight vectors that
    tell us how to weight the source-language Value Vectors vis-a-vis the words in
    the target language.

    You will see two different implementations of the transformer architecture in
    the Transformers co-class of DLStudio:

          TransformerFG
    and
          TransformerPreLN

    with the "FG" suffix standing for "First Generation" and the "PreLN" suffix
    for "Pre LayerNorm". TransformerFG is my implementation of the transformer
    architecture proposed in the famous paper by Vaswani et al.  and
    TransformerPreLN my implementation of the same architecture but with the
    modification suggested by Xiong et al. for more stable learning.  Since, the
    modification is small from an architectural standpoint, I could have combined
    both transformer types in the same implementation with some conditional logic
    to account for the differences.  However, I have chosen to keep them separate
    mostly for educational purposes.  Further details on these implementations are
    in the documentation blocks in the Transformers co-class.

    If you want to use my code for learning the main ideas related to how to
    create purely attention based networks, your starting point for that should be
    the following scripts in the ExamplesTransformers directory of the DLStudio
    distribution:

        seq2seq_with_transformerFG.py
        seq2seq_with_transformerPreLN.py

    These scripts uses the following English-Spanish sentence-pairs dataset

           en_es_xformer_8_90000.tar.gz

    that contains 90,000 pairs of English-Spanish sentences with the maximum
    number of words in each sentence limited to 8 words.  For processing by the
    attention networks, each sentence is enclosed in <SOS> and <EOS> tokens, with
    the former standing for "Start of Sentence" and the latter for "End of
    Sentence".

INSTALLATION

    The DLStudio class was packaged using setuptools.  For installation, execute
    the following command in the source directory (this is the directory that
    contains the setup.py file after you have downloaded and uncompressed the
    package):

            sudo python3 setup.py install

    On Linux distributions, this will install the module file at a location that
    looks like

             /usr/local/lib/python3.7/dist-packages/

    If you do not have root access, you have the option of working directly off
    the directory in which you downloaded the software by simply placing the
    following statements at the top of your scripts that use the DLStudio class:

            import sys
            sys.path.append( "pathname_to_DLStudio_directory" )

    To uninstall the module, simply delete the source directory, locate where the
    DLStudio module was installed with "locate DLStudio" and delete those files.
    As mentioned above, the full pathname to the installed version is likely to
    look like /usr/local/lib/python3.7/dist-packages/DLStudio*

    If you want to carry out a non-standard install of the DLStudio module, look
    up the on-line information on Disutils by pointing your browser to

              http://docs.python.org/dist/dist.html

USAGE

    If you want to specify a network with just a configuration string, your usage
    of the module is going to look like:

        from DLStudio import *

        convo_layers_config = "1x[128,3,3,1]-MaxPool(2) 1x[16,5,5,1]-MaxPool(2)"
        fc_layers_config = [-1,1024,10]

        dls = DLStudio(   dataroot = "/home/kak/ImageDatasets/CIFAR-10/",
                          image_size = [32,32],
                          convo_layers_config = convo_layers_config,
                          fc_layers_config = fc_layers_config,
                          path_saved_model = "./saved_model",
                          momentum = 0.9,
                          learning_rate = 1e-3,
                          epochs = 2,
                          batch_size = 4,
                          classes = ('plane','car','bird','cat','deer',
                                     'dog','frog','horse','ship','truck'),
                          use_gpu = True,
                          debug_train = 0,
                          debug_test = 1,
                      )

        configs_for_all_convo_layers = dls.parse_config_string_for_convo_layers()
        convo_layers = dls.build_convo_layers2( configs_for_all_convo_layers )
        fc_layers = dls.build_fc_layers()
        model = dls.Net(convo_layers, fc_layers)
        dls.show_network_summary(model)
        dls.load_cifar_10_dataset()
        dls.run_code_for_training(model)
        dls.run_code_for_testing(model)


    or, if you would rather experiment with a drop-in network, your usage of the
    module is going to look something like:

        dls = DLStudio(   dataroot = "/home/kak/ImageDatasets/CIFAR-10/",
                          image_size = [32,32],
                          path_saved_model = "./saved_model",
                          momentum = 0.9,
                          learning_rate = 1e-3,
                          epochs = 2,
                          batch_size = 4,
                          classes = ('plane','car','bird','cat','deer',
                                     'dog','frog','horse','ship','truck'),
                          use_gpu = True,
                          debug_train = 0,
                          debug_test = 1,
                      )

        exp_seq = DLStudio.ExperimentsWithSequential( dl_studio = dls )   ## for your drop-in network
        exp_seq.load_cifar_10_dataset_with_augmentation()
        model = exp_seq.Net()
        dls.show_network_summary(model)
        exp_seq.run_code_for_training(model)
        exp_seq.run_code_for_testing(model)


    This assumes that you copy-and-pasted the network you want to
    experiment with in a class like ExperimentsWithSequential that is
    included in the module.

CONSTRUCTOR PARAMETERS

    batch_size:  Carries the usual meaning in the neural network context.

    classes:  A list of the symbolic names for the classes.

    convo_layers_config: This parameter allows you to specify a convolutional network
                  with a configuration string.  Must be formatted as explained in the
                  comment block associated with the method
                  "parse_config_string_for_convo_layers()"

    dataroot: This points to where your dataset is located.

    debug_test: Setting it allow you to see images being used and their predicted
                 class labels every 2000 batch-based iterations of testing.

    debug_train: Does the same thing during training that debug_test does during
                 testing.

    epochs: Specifies the number of epochs to be used for training the network.

    fc_layers_config: This parameter allows you to specify the final
                 fully-connected portion of the network with just a list of
                 the number of nodes in each layer of this portion.  The
                 first entry in this list must be the number '-1', which
                 stands for the fact that the number of nodes in the first
                 layer will be determined by the final activation volume of
                 the convolutional portion of the network.

    image_size:  The heightxwidth size of the images in your dataset.

    learning_rate:  Again carries the usual meaning.

    momentum:  Carries the usual meaning and needed by the optimizer.

    path_saved_model: The path to where you want the trained model to be
                  saved in your disk so that it can be retrieved later
                  for inference.

    use_gpu: You must set it to True if you want the GPU to be used for training.

PUBLIC METHODS

    (1)  build_convo_layers()

         This method creates the convolutional layers from the parameters in the
         configuration string that was supplied through the constructor option
         'convo_layers_config'.  The output produced by the call to
         'parse_config_string_for_convo_layers()' is supplied as the argument to
         build_convo_layers().

    (2)  build_fc_layers()

         From the list of ints supplied through the constructor option
         'fc_layers_config', this method constructs the fully-connected portion of
         the overall network.

    (3)  check_a_sampling_of_images()

         Displays the first batch_size number of images in your dataset.

    (4)  display_tensor_as_image()

         This method will display any tensor of shape (3,H,W), (1,H,W), or just
         (H,W) as an image. If any further data normalizations is needed for
         constructing a displayable image, the method takes care of that.  It has
         two input parameters: one for the tensor you want displayed as an image
         and the other for a title for the image display.  The latter parameter is
         default initialized to an empty string.

    (5)  load_cifar_10_dataset()

         This is just a convenience method that calls on Torchvision's
         functionality for creating a data loader.

    (6)  load_cifar_10_dataset_with_augmentation()

         This convenience method also creates a data loader but it also includes
         the syntax for data augmentation.

    (7)  parse_config_string_for_convo_layers()

         As mentioned in the Introduction, DLStudio module allows you to specify a
         convolutional network with a string provided the string obeys the
         formatting convention described in the comment block of this method.
         This method is for parsing such a string. The string itself is presented
         to the module through the constructor option 'convo_layers_config'.

    (8)  run_code_for_testing()

         This is the method runs the trained model on the test data. Its output is
         a confusion matrix for the classes and the overall accuracy for each
         class.  The method has one input parameter which is set to the network to
         be tested.  This learnable parameters in the network are initialized with
         the disk-stored version of the trained model.

    (9)  run_code_for_training()

         This is the method that does all the training work. If a GPU was detected
         at the time an instance of the module was created, this method takes care
         of making the appropriate calls in order to transfer the tensors involved
         into the GPU memory.

    (10) save_model()

         Writes the model out to the disk at the location specified by the
         constructor option 'path_saved_model'.  Has one input parameter for the
         model that needs to be written out.

    (11) show_network_summary()

         Displays a print representation of your network and calls on the
         torchsummary module to print out the shape of the tensor at the output of
         each layer in the network. The method has one input parameter which is
         set to the network whose summary you want to see.

THE MAIN INNER CLASSES OF THE MODULE

    By "inner classes" I mean the classes that are defined within the class file
    DLStudio.py in the DLStudio directory of the distribution.  The module also
    include what I have referred to as the Co-Classes in the next section.  A
    Co-Class resides at the same level of abstraction as the main DLStudio class
    defined in the DLStudio.py file.

    The purpose of the following two inner classes is to demonstrate how you can
    create a custom class for your own network and test it within the framework
    provided by the DLStudio module.

    (1)  class ExperimentsWithSequential

         This class is my demonstration of experimenting with a network that I
         found on GitHub.  I copy-and-pasted it in this class to test its
         capabilities.  How to call on such a custom class is shown by the
         following script in the Examples directory:

                     playing_with_sequential.py

    (2)  class ExperimentsWithCIFAR

         This is very similar to the previous inner class, but uses a common
         example of a network for experimenting with the CIFAR-10
         dataset. Consisting of 32x32 images, this is a great dataset for creating
         classroom demonstrations of convolutional networks.  As to how you should
         use this class is shown in the following script

                    playing_with_cifar10.py

         in the Examples directory of the distribution.

    (4)  class SkipConnections

         This class is for investigating the power of skip connections in deep
         networks.  Skip connections are used to mitigate a serious problem
         associated with deep networks --- the problem of vanishing gradients.  It
         has been argued theoretically and demonstrated empirically that as the
         depth of a neural network increases, the gradients of the loss become
         more and more muted for the early layers in the network.

    (5)  class DetectAndLocalize

         The code in this inner class is for demonstrating how the same
         convolutional network can simultaneously solve the twin problems of
         object detection and localization.  Note that, unlike the previous four
         inner classes, class DetectAndLocalize comes with its own implementations
         for the training and testing methods. The main reason for that is that
         the training for detection and localization must use two different loss
         functions simultaneously, one for classification of the objects and the
         other for regression. The function for testing is also a bit more
         involved since it must now compute two kinds of errors, the
         classification error and the regression error on the unseen
         data. Although you will find a couple of different choices for the
         training and testing functions for detection and localization inside
         DetectAndLocalize, the ones I have worked with the most are those that
         are used in the following two scripts in the Examples directory:

              run_code_for_training_with_CrossEntropy_and_MSE_Losses()

              run_code_for_testing_detection_and_localization()

    (6)  class CustomDataLoading

         This is a testbed for experimenting with a completely grounds-up attempt
         at designing a custom data loader.  Ordinarily, if the basic format of
         how the dataset is stored is similar to one of the datasets that
         Torchvision knows about, you can go ahead and use that for your own
         dataset.  At worst, you may need to carry out some light customizations
         depending on the number of classes involved, etc.  However, if the
         underlying dataset is stored in a manner that does not look like anything
         in Torchvision, you have no choice but to supply yourself all of the data
         loading infrastructure.  That is what this inner class of the DLStudio
         module is all about.

    (7)  class SemanticSegmentation

         This inner class is for working with the mUnet convolutional network for
         semantic segmentation of images.  This network allows you to segment out
         multiple objects simultaneously from an image.  Each object type is
         assigned a different channel in the output of the network.  So, for
         segmenting out the objects of a specified type in a given input image,
         all you have to do is examine the corresponding channel in the output.

    (8)  class TextClassification

         The purpose of this inner class is to be able to use the DLStudio module
         for simple experiments in text classification.  Consider, for example,
         the problem of automatic classification of variable-length user feedback:
         you want to create a neural network that can label an uploaded product
         review of arbitrary length as positive or negative.  One way to solve
         this problem is with a Recurrent Neural Network in which you use a hidden
         state for characterizing a variable-length product review with a
         fixed-length state vector.

    (9)  class TextClassificationWithEmbeddings

         This class has the same functionality as the previous text processing
         class except that now we use embeddings for representing the words.  Word
         embeddings are fixed-sized numerical vectors that are learned on the
         basis of the contextual similarity of the words. The implementation of
         this inner class uses the pre-trained 300-element word2vec embeddings as
         made available by Google for 3 million words and phrases drawn from the
         Google News dataset. In DLStudio, we access these embeddings through the
         popular gensim library.

CO-CLASSES IN THE DLStudio MODULE

    As stated at the beginning of the previous section, a Co-Class resides at the
    same level of abstraction in the distribution directory as the main DLStudio
    class. Each Co-Class is defined in a separate subdirectory at the top level of
    the distribution directory.  While the main DLStudio class is defined in a
    subdirectory of the same name, the other subdirectories that contain
    definitions for the co-classes are named AdversarialLearning, Seq2SeqLearning,
    DataPrediction, and Transformers.  What follows in this section are additional
    details regarding these co-classes:

    AdversarialLearning:
    ===================

    As I mentioned in the Introduction, the purpose of the AdversarialLearning
    class is to demonstrate probabilistic data modeling using Generative
    Adversarial Networks (GAN).  GANs use Discriminator-Generator or
    Critic-Generator pairs to learn probabilistic data models that can
    subsequently be used to create new image instances that look surprisingly
    similar to those in the training set.  At the moment, you will find the
    following three such pairs inside the AdversarialLearning class:

        1.  Discriminator-Generator DG1      ---  implements the DCGAN logic

        2.  Discriminator-Generator DG2      ---  a slight modification of the previous

        3.  Critic-Generator CG1             ---  implements the Wasserstein GAN logic

        4.  Critic-Generator CG2             ---  adds the Gradient Penalty to the
                                                  Wasserstein GAN logic.

    In the ExamplesAdversarialLearning directory of the distro you will see the
    following scripts that demonstrate adversarial learning as incorporated in the
    above networks:

        1.  dcgan_DG1.py                     ---  demonstrates the DCGAN DG1

        2.  dcgan_DG2.py                     ---  demonstrates the DCGAN DG2

        3.  wgan_CG1.py                      ---  demonstrates the Wasserstein GAN CG1

        4.  wgan_with_gp_CG2.py              ---  demonstrates the Wasserstein GAN CG2

    All of these scripts use the training dataset PurdueShapes5GAN that consists
    of 20,000 images containing randomly shaped, randomly colored, and randomly
    positioned objects in 64x64 arrays.  The dataset comes in the form of a
    gzipped archive named "datasets_for_AdversarialLearning.tar.gz" that is
    provided under the link "Download the image dataset for AdversarialLearning"
    at the top of the HTML version of this doc page.  See the README in the
    ExamplesAdversarialLearning directory for how to unpack the archive.

    Seq2SeqLearning:
    ===============

    As mentioned earlier in the Introduction, sequence-to-sequence learning
    (seq2seq) is about predicting an outcome sequence from a causation sequence,
    or, said another way, a target sequence from a source sequence.  Automatic
    machine translation is probably one of the most popular applications of
    seq2seq.  DLStudio uses English-to-Spanish translation to illustrate the
    programming idioms and the PyTorch structures you would need for writing your
    own code for seq2seq.

    Any attempt at seq2seq for machine translation must answer the following
    question at the outset: How to represent the words of a language for
    neural-network based processing? In general, you have two options: (1) Have
    your overall network learn on its own what are known as vector embeddings for
    the words; or (2) Use pre-trained embeddings as provided by word2vec or
    Fasttext.

    After you have resolved the issue of word representation, your next challenge
    is how to implement the attention mechanism that you're going to need for
    aligning the similar grammatical units in the two languages. The seq2seq code
    demonstrated in this co-class uses the attention model proposed by Bahdanau,
    Cho, and Bengio in the form of a separate Attention class.  The name of this
    attention class is Attention_BCB.  In a separate attention class named
    Attention_SR, I have also included the attention mechanism used by Sean
    Robertson in his very popular NLP tutorial at the main PyTorch website.

    Seq2SeqLearning contains the following two inner classes for illustrating
    seq2seq:

        1.  Seq2SeqWithLearnableEmbeddings

        2.  Seq2SeqWithPretrainedEmbeddings

    In the first of these, Seq2SeqWithLearnableEmbeddings, the words embeddings
    are learned automatically by using the nn.Embeddings layer. On the other hand,
    in Seq2SeqWithPretrainedEmbeddings, I have used the word2vec embeddings for
    the source language English and allowed the system to learn the embeddings for
    the target language Spanish.

    In order to become familiar with these classes, your best entry points would
    be the following scripts in the ExamplesSeq2SeqLearning directory:

                seq2seq_with_learnable_embeddings.py

                seq2seq_with_pretrained_embeddings.py

    DataPrediction
    ==============

    As mentioned earlier in the Introduction, time-series data prediction differs
    from the more symbolic sequence-based learning frameworks with regard to the
    following: (1) Data normalization; (2) Data chunking; and (3) Datetime
    conditioning. The reason I mention data normalization is that now you have to
    remember the scaling parameters used for data normalization since you are
    going to need to inverse-normalize the predicted values. You would want to
    your predicted values to be at the same scale as the time-series observations.
    The second issue, data chunking, refers to the fact that the notion of a
    "sentence" does not exist in time-series data.  What that implies that the
    user has to decide how to extract sequences from arbitrary long time-series
    data for training a prediction framework.  Finally, the the third issue,
    datetime conditioning, refers to creating a multi-dimensional encoding for the
    datetime stamp associated with each observation to account for the diurnal,
    weekly, seasonal, and other such temporal effects.

    The data prediction framework in the DataPrediction part of DLStudio is based
    on the following inner class:

        pmGRU

    for "Poor Man's GRU".  This GRU is my implementation of the "Minimal Gated
    Unit" GRU variant that was first presented by Joel Heck and Fathi Salem in
    their paper "Simplified Minimal Gated Unit Variations for Recurrent Neural
    Networks" and it combines the Update and the Reset gates of a regular GRU into
    a single gate called the Forget Gate.

    My reason for using pmGRU is purely educational. While you are likely to use
    PyTorch's GRU for any production work that requires a GRU, using a
    pre-programmed piece of code makes it more difficult to gain insights into how
    the logic of a GRU (especially with regard to the gating action it needs) is
    actually implemented.  The implementation code shown for pmGRU is supposed to
    help remedy that.

    As I mentioned in the Introduction, your main entry point for experimenting
    with data prediction is the following script in the ExamplesDataPrediction
    directory of the DLStudio distribution:

        power_load_prediction_with_pmGRU.py

    However, before you can run this script, you would need to download the
    training dataset used in this example.  See the "For Data Prediction" part of
    the "The Datasets Included" section of the doc page for that.

    Transformers
    ============

    The code in this co-class of DLStudio consists two slightly different
    implementations of the transformer architecture: TransformerFG and
    TransformerPreLN.  TransformerFG is my implementation of the architecture as
    conceptualized in the famous paper "Attention is All You Need" by Vaswani et
    el.  And TransformerPreLN is my implementation of the original idea along with
    the modifications suggested by Xiong et al. in their paper "On Layer
    Normalization in the Transformer Architecture" for more stable learning.  The
    two versions of transformers differ in only one respect: The placement of the
    LayerNorm in relation to the architectural components related to attention and
    the feedforward network.  Literally, the difference is small, yet its
    consequences on the stability of learning significant.

    The fundamentals of how the attention works in both TransformerFG and
    TransformerPreLN are exactly the same.  For self-attention, you associate a
    Query Vector Q_i and a Key Vector K_i with each word w_i in a sentence.  For a
    given w_i, the dot product of its Q_i with the K_j vectors for all the other
    words w_j is a measure of how related w_i is to each w_j with regard to what's
    needed for the translation of a source sentence into the target sentence.  One
    more vector you associate with each word w_i is the Value Vector V_i.  The
    value vectors for the words in a sentence are weighted by the output of the
    activation nn.LogSoftmax applied to the dot-products.

    The self-attention mechanism described above is half of what goes into each
    base encoder of a transformer, the other half is a feedforward network
    (FFN). The overall encoder consists of a cascade of these base encoders.  In
    my implementation, I have referred to the overall encoder as the
    MasterEncoder.  The MasterDecoder also consists of a cascade of base decoders.
    A base decoder is similar to a base encoder except for there being a layer of
    cross-attention interposed between the self-attention layer and the
    feedforward network.

    Referring to the attention half of each base encoder or a decoder as one
    half-unit and the FFN as the other half unit, the problem of vanishing
    gradients that would otherwise be caused by the depth of the overall network
    is mitigated by using LayerNorm and residual connections.  In TransformerFG,
    on the encoder side, LayerNorm is applied to the output of the self-attention
    layer and the residual connection wraps around both.  Along the same lines,
    LayerNorm is applied to the output of FFN and the residual connection wraps
    around both.

    In TransformerPreLN, on the other hand, LayerNorm is applied to the input to
    the self-attention layer and residual connection wraps around both.
    Similarly, LayerNorm is applied to the input to FFN and the residual
    connection wraps around both.  Similar considerations applied to the decoder
    side, except we now also have a layer of cross-attention interposed between
    the self-attention and FFN.

    As I mentioned in the Introduction, your main entry point for experimenting
    with the transformer code in DLStudio are the following two scripts in the
    ExamplesTransformers directory of the distribution:

        seq2seq_with_transformerFG.py
        seq2seq_with_transformerPreLN.py

    However, before you can run these scripts, you would need to download the
    training dataset used in these examples.  See the "For Transformers" part of
    the "The Datasets Included" section of this doc page for that.

Examples DIRECTORY

    The Examples subdirectory in the distribution contains the following scripts:

    (1)  playing_with_reconfig.py

         Shows how you can specify a convolution network with a configuration
         string.  The DLStudio module parses the string constructs the network.

    (2)  playing_with_sequential.py

         Shows you how you can call on a custom inner class of the 'DLStudio'
         module that is meant to experiment with your own network.  The name of
         the inner class in this example script is ExperimentsWithSequential

    (3)  playing_with_cifar10.py

         This is very similar to the previous example script but is based on the
         inner class ExperimentsWithCIFAR which uses more common examples of
         networks for playing with the CIFAR-10 dataset.

    (5)  playing_with_skip_connections.py

         This script illustrates how to use the inner class BMEnet of the module
         for experimenting with skip connections in a CNN. As the script shows,
         the constructor of the BMEnet class comes with two options:
         skip_connections and depth.  By turning the first on and off, you can
         directly illustrate in a classroom setting the improvement you can get
         with skip connections.  And by giving an appropriate value to the "depth"
         option, you can show results for networks of different depths.

    (6)  custom_data_loading.py

         This script shows how to use the custom dataloader in the inner class
         CustomDataLoading of the DLStudio module.  That custom dataloader is
         meant specifically for the PurdueShapes5 dataset that is used in object
         detection and localization experiments in DLStudio.

    (7)  object_detection_and_localization.py

         This script shows how you can use the functionality provided by the inner
         class DetectAndLocalize of the DLStudio module for experimenting with
         object detection and localization.  Detecting and localizing (D&L)
         objects in images is a more difficult problem than just classifying the
         objects.  D&L requires that your CNN make two different types of
         inferences simultaneously, one for classification and the other for
         localization.  For the localization part, the CNN must carry out what is
         known as regression. What that means is that the CNN must output the
         numerical values for the bounding box that encloses the object that was
         detected.  Generating these two types of inferences requires two
         different loss functions, one for classification and the other for
         regression.

    (8)  noisy_object_detection_and_localization.py

         This script in the Examples directory is exactly the same as the one
         described above, the only difference is that it calls on the
         noise-corrupted training and testing dataset files.  I thought it would
         be best to create a separate script for studying the effects of noise,
         just to allow for the possibility that the noise-related studies with
         DLStudio may evolve differently in the future.

    (9)  object_detection_and_localization_iou.py

         This script in the Examples directory is for experimenting with the
         variants of the IoU (Intersection over Union) loss functions provided by
         the class DIoULoss class that is a part of DLStudio's inner class
         DetectAndLocalize.  This script uses the same datasets as the script
         mentioned in item 7 above.

    (10) semantic_segmentation.py

         This script should be your starting point if you wish to learn how to use
         the mUnet neural network for semantic segmentation of images.  As
         mentioned elsewhere in this documentation page, mUnet assigns an output
         channel to each different type of object that you wish to segment out
         from an image. So, given a test image at the input to the network, all
         you have to do is to examine each channel at the output for segmenting
         out the objects that correspond to that output channel.

    (11) text_classification_with_TEXTnet.py

         This script is your first introduction in DLStudio to a Recurrent Neural
         Network, meaning a neural-network with feedback.  Such networks are
         needed for solving problems related to variable length input data in
         applications such as text classification, sentiment analysis, machine
         translation, etc.  Unfortunately, unless care is taken, the feedback in
         such networks results in long chains of dependencies and thus exacerbates
         the vanishing gradients problem.  The specific goal of this script is
         neural learning for automatic classification of product reviews.

    (12) text_classification_with_TEXTnet_word2vec.py

         This script uses the same learning network as in the previous script, but
         there is a big difference between the two.  The previous network uses
         one-hot vectors for representing the words. On the other hand, this
         script uses pre-trained word2vec embeddings.  These are fixed-sized
         numerical vectors that are learned on the basis of contextual
         similarities.

    (13) text_classification_with_TEXTnetOrder2.py

         As mentioned earlier for the script in item 10 above, the vanishing
         gradients problem becomes worse in neural networks with feedback.  One
         way to get around this problem is to use what's known as "gated
         recurrence".  This script uses the TEXTnetOrder2 network as a stepping
         stone to a full-blown implementation of gating as provided by the nn.GRU
         class in item 14 below.

    (14) text_classification_with_TEXTnetOrder2_word2vec.py

         This script uses the same network as the previous script, but now we use
         the word2vec embeddings for representing the words.

    (15) text_classification_with_GRU.py

         This script demonstrates how one can use a GRU (Gated Recurrent Unit) to
         remediate one of the main problems associated with recurrence --
         vanishing gradients in the long chains of dependencies created by
         feedback.

    (16) text_classification_with_GRU_word2vec.py

         While this script uses the same learning network as the previous one, the
         words are now represented by fixed-sized word2vec embeddings.

ExamplesAdversarialLearning DIRECTORY

    The ExamplesAdversarialLearning directory of the distribution contains the
    following scripts for demonstrating adversarial learning for data modeling:

        1.  dcgan_DG1.py

        2.  dcgan_DG2.py

        3.  wgan_CG1.py

        4.  wgan_with_gp_CG2.py

    The first script demonstrates the DCGAN logic on the PurdueShapes5GAN dataset.
    In order to show the sensitivity of the basic DCGAN logic to any variations in
    the network or the weight initializations, the second script introduces a
    small change in the network.  The third script is a demonstration of using the
    Wasserstein distance for data modeling through adversarial learning.  The
    fourth script adds a Gradient Penalty term to the Wasserstein Distance based
    logic of the third script.  The PurdueShapes5GAN dataset consists of 64x64
    images with randomly shaped, randomly positioned, and randomly colored shapes.

    The results produced by these scripts (for the constructor options shown in
    the scripts) are included in a subdirectory named RVLCloud_based_results.  If
    you are just becoming familiar with the AdversarialLearning class of DLStudio,
    I'd urge you to run the script with the constructor options as shown and to
    compare your results with those that are in the RVLCloud_based_results
    directory.

ExamplesSeq2SeqLearning DIRECTORY

    The ExamplesSeq2SeqLearning directory of the distribution contains the
    following scripts for demonstrating sequence-to-sequence learning:

    (1) seq2seq_with_learnable_embeddings.py

         This script demonstrates the basic PyTorch structures and idioms to use
         for seq2seq learning.  The application example addressed in the script is
         English-to-Spanish translation.  And the attention mechanism used for
         seq2seq is the one proposed by Bahdanau, Cho, and Bengio.  This network
         used in this example calls on the nn.Embeddings layer in the encoder to
         learn the embeddings for the words in the source language and a similar
         layer in the decoder to learn the embeddings to use for the target
         language.

    (2) seq2seq_with_pretrained_embeddings.py

         This script, also for seq2seq learning, differs from the previous one in
         only one respect: it uses Google's word2vec embeddings for representing
         the words in the source-language sentences (English).  As to why I have
         not used at this time the pre-trained embeddings for the target language
         is explained in the main comment doc associated with the class
         Seq2SeqWithPretrainedEmbeddings.

ExamplesDataPrediction DIRECTORY

    The ExampsleDataPrediction directory of the distribution contains the
    following script for demonstrating data prediction for time-series data:

        power_load_prediction_with_pmGRU.py

    This script uses a subset of the dataset provided by Kaggle for one of their
    machine learning competitions.  The dataset consists of over 10-years worth of
    hourly electric load recordings made available by several utilities in the
    east and the midwest of the United States.  You can download this dataset from
    a link at the top of the main DLStudio doc page.

ExamplesTransformers DIRECTORY

    The ExamplesTransformers directory of the distribution contains the following
    two scripts for experimenting with transformers:

        seq2seq_with_transformerFG.py
        seq2seq_with_transformerPreLN.py

    Both these scripts deal with English-to-Spanish translation in a manner
    similar to what's demonstrated by the code in the Seq2SeqLearning co-class and
    the example scripts associated with that co-class.

    The directory also contains the following two scripts

        test_checkpointFG.py
        test_checkpointPreLN.py

    to address the problems of training a transformer network.  As I have
    mentioned elsewhere in this documentation, transformer training can be
    frustrating, to say the least, and can take a very long time.  What
    exacerbates the frustrations is that, with a wrong choice for the
    hyperparameters, you could end with model divergence in the middle of training
    and not know about it until the end.  [Model divergence is akin to mode
    collapse in training a GAN.]  To deal with these problems, starting with
    Version 2.2.7, the transformer training routines now create a checkpoint of
    the model every 5 epochs. While the training is going on, you can evaluate the
    checkpoints in the manner illustrated by the above two scripts.

THE DATASETS INCLUDED

    [must be downloaded separately]

   FOR THE MAIN DLStudio MODULE

        Download the dataset archive 'datasets_for_DLStudio.tar.gz' through the
        link "Download the image datasets for the main DLStudio Class" provided at
        the top of this documentation page and store it in the 'Example' directory
        of the distribution.  Subsequently, execute the following command in the
        'Examples' directory:

            cd Examples
            tar zxvf datasets_for_DLStudio.tar.gz

        This command will create a 'data' subdirectory in the 'Examples' directory
        and deposit the datasets mentioned below in that subdirectory.

         FOR OBJECT DETECTION AND LOCALIZATION

        Training a CNN for object detection and localization requires training and
        testing datasets that come with bounding-box annotations. This module
        comes with the PurdueShapes5 dataset for that purpose.  I created this
        small-image-format dataset out of my admiration for the CIFAR-10 dataset
        as an educational tool for demonstrating classification networks in a
        classroom setting. You will find the following dataset archive files in
        the "data" subdirectory of the "Examples" directory of the distro:

            (1)  PurdueShapes5-10000-train.gz
                 PurdueShapes5-1000-test.gz

            (2)  PurdueShapes5-20-train.gz
                 PurdueShapes5-20-test.gz

        The number that follows the main name string "PurdueShapes5-" is for the
        number of images in the dataset.  You will find the last two datasets,
        with 20 images each, useful for debugging your logic for object detection
        and bounding-box regression.

        As to how the image data is stored in the archives, please see the main
        comment block for the inner class CustomLoading in this file.

         FOR DETECTING OBJECTS IN NOISE-CORRUPTED IMAGES

        In terms of how the image data is stored in the dataset files, this
        dataset is no different from the PurdueShapes5 dataset described above.
        The only difference is that we now add varying degrees of noise to the
        images to make it more challenging for both classification and regression.

        The archive files you will find in the 'data' subdirectory of the
        'Examples' directory for this dataset are:

            (3)  PurdueShapes5-10000-train-noise-20.gz
                 PurdueShapes5-1000-test-noise-20.gz

            (4)  PurdueShapes5-10000-train-noise-50.gz
                 PurdueShapes5-1000-test-noise-50.gz

            (5)  PurdueShapes5-10000-train-noise-80.gz
                 PurdueShapes5-1000-test-noise-80.gz

        In the names of these six archive files, the numbers 20, 50, and 80 stand
        for the level of noise in the images.  For example, 20 means 20% noise.
        The percentage level indicates the fraction of the color value range that
        is added as randomly generated noise to the images.  The first integer in
        the name of each archive carries the same meaning as mentioned above for
        the regular PurdueShapes5 dataset: It stands for the number of images in
        the dataset.

         FOR SEMANTIC SEGMENTATION

        Showing interesting results with semantic segmentation requires images
        that contains multiple objects of different types.  A good semantic
        segmenter would then allow for each object type to be segmented out
        separately from an image.  A network that can carry out such segmentation
        needs training and testing datasets in which the images come up with
        multiple objects of different types in them. Towards that end, I have
        created the following dataset:

            (6) PurdueShapes5MultiObject-10000-train.gz
                PurdueShapes5MultiObject-1000-test.gz

            (7) PurdueShapes5MultiObject-20-train.gz
                PurdueShapes5MultiObject-20-test.gz

        The number that follows the main name string "PurdueShapes5MultiObject-"
        is for the number of images in the dataset.  You will find the last two
        datasets, with 20 images each, useful for debugging your logic for
        semantic segmentation.

        As to how the image data is stored in the archive files listed above,
        please see the main comment block for the class

            PurdueShapes5MultiObjectDataset

        As explained there, in addition to the RGB values at the pixels that are
        stored in the form of three separate lists called R, G, and B, the shapes
        themselves are stored in the form an array of masks, each of size 64x64,
        with each mask array representing a particular shape. For illustration,
        the rectangle shape is represented by the first such array. And so on.

         FOR TEXT CLASSIFICATION

        My experiments tell me that, when using gated RNNs, the size of the
        vocabulary can significantly impact the time it takes to train a neural
        network for text modeling and classification.  My goal was to provide
        curated datasets extract from the Amazon user-feedback archive that would
        lend themselves to experimentation on, say, your personal laptop with a
        rudimentary GPU like the Quadro.  Here are the new datasets you can now
        download from the main documentation page for this module:


                 sentiment_dataset_train_200.tar.gz        vocab_size = 43,285
                 sentiment_dataset_test_200.tar.gz

                 sentiment_dataset_train_40.tar.gz         vocab_size = 17,001
                 sentiment_dataset_test_40.tar.gz

                 sentiment_dataset_train_400.tar.gz        vocab_size = 64,350
                 sentiment_dataset_test_400.tar.gz

        As with the other datasets, the integer in the name of each dataset is the
        number of reviews collected from the 'positive.reviews' and the
        'negative.reviews' files for each product category.  Therefore, the
        dataset with 200 in its name has a total of 400 reviews for each product
        category.  Also provided are two datasets named
        "sentiment_dataset_train_3.tar.gz" and sentiment_dataset_test_3.tar.gz"
        just for the purpose of debugging your code.

        The last dataset, the one with 400 in its name, was added in Version 1.1.3
        of the module.

   FOR Seq2Seq LEARNING

        For sequence-to-sequence learning with DLStudio, you can download an
        English-Spanish translation corpus through the following archive:

            en_es_corpus_for_seq2sq_learning_with_DLStudio.tar.gz

        This data archive is a lightly curated version of the main dataset posted
        at "http://www.manythings.org/anki/" by the folks at "tatoeba.org".  My
        alterations to the original dataset consist mainly of expanding the
        contractions like "it's", "I'm", "don't", "didn't", "you'll", etc., into
        their "it is", "i am", "do not", "did not", "you will", etc. The original
        form of the dataset contains 417 such unique contractions.  Another
        alteration I made to the original data archive is to surround each
        sentence in both English and Spanish by the "SOS" and "EOS" tokens, with
        the former standing for "Start of Sentence" and the latter for "End of
        Sentence".

        Download the above archive in the ExamplesSeq2Seq2Learning directory and
        execute the following command in that directory:

            tar zxvf en_es_corpus_for_seq2sq_learning_with_DLStudio.tar.gz

        This command will create a 'data' subdirectory in the directory
        ExamplesSeq2Seq2Learning and deposit the following dataset archive in that
        subdirectory:

            en_es_8_98988.tar.gz

        Now execute the following in the 'data' directory:

            tar zxvf en_es_8_98988.tar.gz

        With that, you should be able to execute the Seq2SeqLearning based scripts
        in the 'ExamplesSeq2SeqLearning' directory.

   FOR ADVERSARIAL LEARNING

        Download the dataset archive

            datasets_for_AdversarialLearning.tar.gz

        through the link "Download the image dataset for AdversarialLearning"
        provided at the top of the HTML version of this doc page and store it in
        the 'ExamplesAdversarialLearning' directory of the distribution.
        Subsequently, execute the following command in the directory
        'ExamplesAdversarialLearning':

            tar zxvf datasets_for_AdversarialLearning.tar.gz

        This command will create a 'dataGAN' subdirectory and deposit the
        following dataset archive in that subdirectory:

            PurdueShapes5GAN-20000.tar.gz

        Now execute the following in the "dataGAN" directory:

            tar zxvf PurdueShapes5GAN-20000.tar.gz

        With that, you should be able to execute the adversarial learning based
        scripts in the 'ExamplesAdversarialLearning' directory.

   FOR DATA PREDICTION

        Download the dataset archive

            dataset_for_DataPrediction.tar.gz

        into the ExamplesDataPrediction directory of the DLStudio distribution.
        Next, execute the following command in that directory:

            tar zxvf dataset_for_DataPrediction.tar.gz

        That will create data directory named "dataPred" in the
        ExamplesDataPrediction directory.  With that you should be able to execute
        the data prediction script in that directory.

   FOR TRANSFORMERS

        Download the dataset archive

            en_es_corpus_for_learning_with_Transformers.tar.gz

        into the ExamplesTransformers directory of the DLStudio distribution.
        Next, execute the following command in that directory:

            tar zxvf en_es_corpus_for_learning_with_Transformers.tar.gz

        That will create a 'data' subdirectory in the ExamplesTransformers
        directory and deposit in that subdirectory the following archives

            en_es_xformer_8_10000.tar.gz
            en_es_xformer_8_90000.tar.gz

        These are both derived from the same data source as in the dataset for the
        examples associated with the Seq2SeqLearning co-class.  The first has only
        10,000 pars of English-Spanish sentences and meant primarily for debugging
        purposes.  The second contains 90000 pairs of such sentences.  The number
        '8' in the dataset names means that no sentence contains more than 8 real
        words.  With the "SOS" and "EOS" tokens used as sentence delimiters, the
        maximum number of words in each sentence in either language is 10.

BUGS

    Please notify the author if you encounter any bugs.  When sending email,
    please place the string 'DLStudio' in the subject line to get past the
    author's spam filter.

ACKNOWLEDGMENTS

    Thanks to Praneet Singh and Noureldin Hendy for their comments related to the
    buggy behavior of the module when using the 'depth' parameter to change the
    size of a network. Thanks also go to Christina Eberhardt for reminding me that
    I needed to change the value of the 'dataroot' parameter in my Examples
    scripts prior to packaging a new distribution.  Their feedback led to Version
    1.1.1 of this module.  Regarding the changes made in Version 1.1.4, one of
    them is a fix for the bug found by Serdar Ozguc in Version 1.1.3. Thanks
    Serdar.

    Version 2.0.3: I owe thanks to Ankit Manerikar for many wonderful
    conversations related to the rapidly evolving area of generative adversarial
    networks in deep learning.  It is obviously important to read research papers
    to become familiar with the goings-on in an area.  However, if you wish to
    also develop deep intuitions in those concepts, nothing can beat having great
    conversations with a strong researcher like Ankit.  Ankit is finishing his
    Ph.D. in the Robot Vision Lab at Purdue.

    Version 2.2.2: My laboratory's (RVL) journey into the world of transformers
    began with a series of lab seminars by Constantine Roros and Rahul Deshmukh.
    Several subsequent conversations with them were instrumental in helping me
    improve the understanding I had gained from the seminars.  Additional
    conversations with Rahul about the issue of masking were important to how I
    eventually implemented those ideas in my code.

ABOUT THE AUTHOR

    The author, Avinash Kak, is a professor of Electrical and Computer Engineering
    at Purdue University.  For all issues related to this module, contact the
    author at kak@purdue.edu If you send email, please place the string "DLStudio"
    in your subject line to get past the author's spam filter.

COPYRIGHT

    Python Software Foundation License

    Copyright 2023 Avinash Kak

@endofdocs

Imported Modules

torch.nn.functional
PIL.ImageFilter
copy
gzip
logging
math

torch.nn
numpy
numbers
torch.optim
os
pickle

matplotlib.pyplot
pymsgbox
random
re
sys
time

torch
torchvision
torchvision.transforms

Classes

builtins.object

DLStudio

class DLStudio(builtins.object)

DLStudio(*args, **kwargs)

Methods defined here:

__init__(self, *args, **kwargs): Initialize self. See help(type(self)) for accurate signature.

build_convo_layers(self, configs_for_all_convo_layers)

build_fc_layers(self)

check_a_sampling_of_images(self): Displays the first batch_size number of images in your dataset.

display_tensor_as_image(self, tensor, title=''): This method converts the argument tensor into a photo image that you can display in your terminal screen. It can convert tensors of three different shapes into images: (3,H,W), (1,H,W), and (H,W), where H, for height, stands for the number of pixels in the vertical direction and W, for width, for the same along the horizontal direction. When the first element of the shape is 3, that means that the tensor represents a color image in which each pixel in the (H,W) plane has three values for the three color channels. On the other hand, when the first element is 1, that stands for a tensor that will be shown as a grayscale image. And when the shape is just (H,W), that is automatically taken to be for a grayscale image.

imshow(self, img): called by display_tensor_as_image() for displaying the image

load_cifar_10_dataset(self): In the code shown below, the call to "ToTensor()" converts the usual int range 0-255 for pixel values to 0-1.0 float vals and then the call to "Normalize()" changes the range to -1.0-1.0 float vals. For additional explanation of the call to "tvt.ToTensor()", see Slide 31 of my Week 2 slides at the DL course website. And see Slides 32 and 33 for the syntax "tvt.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))". In this call, the three numbers in the first tuple change the means in the three color channels and the three numbers in the second tuple change the standard deviations according to the formula: image_channel_val = (image_channel_val - mean) / std The end result is that the values in the image tensor will be normalized to fall between -1.0 and +1.0. If needed we can do inverse normalization by image_channel_val = (image_channel_val * std) + mean

load_cifar_10_dataset_with_augmentation(self): In general, we want to do data augmentation for training:

parse_config_string_for_convo_layers(self): Each collection of 'n' otherwise identical layers in a convolutional network is specified by a string that looks like: "nx[a,b,c,d]-MaxPool(k)" where n = num of this type of convo layer a = number of out_channels [in_channels determined by prev layer] b,c = kernel for this layer is of size (b,c) [b along height, c along width] d = stride for convolutions k = maxpooling over kxk patches with stride of k Example: "n1x[a1,b1,c1,d1]-MaxPool(k1) n2x[a2,b2,c2,d2]-MaxPool(k2)"

run_code_for_testing(self, net, display_images=False)

run_code_for_training(self, net, display_images=False)

save_model(self, model): Save the trained model to a disk file

Data descriptors defined here:

__dict__: dictionary for instance variables (if defined)

__weakref__: list of weak references to the object (if defined)

Data and other attributes defined here:

CustomDataLoading = <class 'DLStudio.DLStudio.CustomDataLoading'>: This is a testbed for experimenting with a completely grounds-up attempt at designing a custom data loader. Ordinarily, if the basic format of how the dataset is stored is similar to one of the datasets that the Torchvision module knows about, you can go ahead and use that for your own dataset. At worst, you may need to carry out some light customizations depending on the number of classes involved, etc. However, if the underlying dataset is stored in a manner that does not look like anything in Torchvision, you have no choice but to supply yourself all of the data loading infrastructure. That is what this inner class of the DLStudio module is all about. The custom data loading exercise here is related to a dataset called PurdueShapes5 that contains 32x32 images of binary shapes belonging to the following five classes: 1. rectangle 2. triangle 3. disk 4. oval 5. star The dataset was generated by randomizing the sizes and the orientations of these five patterns. Since the patterns are rotated with a very simple non-interpolating transform, just the act of random rotations can introduce boundary and even interior noise in the patterns. Each 32x32 image is stored in the dataset as the following list: [R, G, B, Bbox, Label] where R : is a 1024 element list of the values for the red component of the color at all the pixels B : the same as above but for the green component of the color G : the same as above but for the blue component of the color Bbox : a list like [x1,y1,x2,y2] that defines the bounding box for the object in the image Label : the shape of the object I serialize the dataset with Python's pickle module and then compress it with the gzip module. You will find the following dataset directories in the "data" subdirectory of Examples in the DLStudio distro: PurdueShapes5-10000-train.gz PurdueShapes5-1000-test.gz PurdueShapes5-20-train.gz PurdueShapes5-20-test.gz The number that follows the main name string "PurdueShapes5-" is for the number of images in the dataset. You will find the last two datasets, with 20 images each, useful for debugging your logic for object detection and bounding-box regression. Class Path: DLStudio -> CustomDataLoading

DetectAndLocalize = <class 'DLStudio.DLStudio.DetectAndLocalize'>: The purpose of this inner class is to focus on object detection in images --- as opposed to image classification. Most people would say that object detection is a more challenging problem than image classification because, in general, the former also requires localization. The simplest interpretation of what is meant by localization is that the code that carries out object detection must also output a bounding-box rectangle for the object that was detected. You will find in this inner class some examples of LOADnet classes meant for solving the object detection and localization problem. The acronym "LOAD" in "LOADnet" stands for "LOcalization And Detection" The different network examples included here are LOADnet1, LOADnet2, and LOADnet3. For now, only pay attention to LOADnet2 since that's the class I have worked with the most for the 1.0.7 distribution. Class Path: DLStudio -> DetectAndLocalize

ExperimentsWithCIFAR = <class 'DLStudio.DLStudio.ExperimentsWithCIFAR'>: Class Path: DLStudio -> ExperimentsWithCIFAR

ExperimentsWithSequential = <class 'DLStudio.DLStudio.ExperimentsWithSequential'>: Demonstrates how to use the torch.nn.Sequential container class Class Path: DLStudio -> ExperimentsWithSequential

Net = <class 'DLStudio.DLStudio.Net'>

SemanticSegmentation = <class 'DLStudio.DLStudio.SemanticSegmentation'>: The purpose of this inner class is to be able to use the DLStudio module for experiments with semantic segmentation. At its simplest level, the purpose of semantic segmentation is to assign correct labels to the different objects in a scene, while localizing them at the same time. At a more sophisticated level, a system that carries out semantic segmentation should also output a symbolic expression based on the objects found in the image and their spatial relationships with one another. The workhorse of this inner class is the mUnet network that is based on the UNET network that was first proposed by Ronneberger, Fischer and Brox in the paper "U-Net: Convolutional Networks for Biomedical Image Segmentation". Their Unet extracts binary masks for the cell pixel blobs of interest in biomedical images. The output of their Unet can therefore be treated as a pixel-wise binary classifier at each pixel position. The mUnet class, on the other hand, is intended for segmenting out multiple objects simultaneously form an image. [A weaker reason for "Multi" in the name of the class is that it uses skip connections not only across the two arms of the "U", but also also along the arms. The skip connections in the original Unet are only between the two arms of the U. In mUnet, each object type is assigned a separate channel in the output of the network. This version of DLStudio also comes with a new dataset, PurdueShapes5MultiObject, for experimenting with mUnet. Each image in this dataset contains a random number of selections from five different shapes, with the shapes being randomly scaled, oriented, and located in each image. The five different shapes are: rectangle, triangle, disk, oval, and star. Class Path: DLStudio -> SemanticSegmentation

SkipConnections = <class 'DLStudio.DLStudio.SkipConnections'>: This educational class is meant for illustrating the concepts related to the use of skip connections in neural network. It is now well known that deep networks are difficult to train because of the vanishing gradients problem. What that means is that as the depth of network increases, the loss gradients calculated for the early layers become more and more muted, which suppresses the learning of the parameters in those layers. An important mitigation strategy for addressing this problem consists of creating a CNN using blocks with skip connections. With the code shown in this inner class of the module, you can now experiment with skip connections in a CNN to see how a deep network with this feature might improve the classification results. As you will see in the code shown below, the network that allows you to construct a CNN with skip connections is named BMEnet. As shown in the script playing_with_skip_connections.py in the Examples directory of the distribution, you can easily create a CNN with arbitrary depth just by using the "depth" constructor option for the BMEnet class. The basic block of the network constructed by BMEnet is called SkipBlock which, very much like the BasicBlock in ResNet-18, has a couple of convolutional layers whose output is combined with the input to the block. Note that the value given to the "depth" constructor option for the BMEnet class does NOT translate directly into the actual depth of the CNN. [Again, see the script playing_with_skip_connections.py in the Examples directory for how to use this option.] The value of "depth" is translated into how many instances of SkipBlock to use for constructing the CNN. Class Path: DLStudio -> SkipConnections

TextClassification = <class 'DLStudio.DLStudio.TextClassification'>: The purpose of this inner class is to be able to use the DLStudio module for simple experiments in text classification. Consider, for example, the problem of automatic classification of variable-length user feedback: you want to create a neural network that can label an uploaded product review of arbitrary length as positive or negative. One way to solve this problem is with a recurrent neural network in which you use a hidden state for characterizing a variable-length product review with a fixed-length state vector. This inner class allows you to carry out such experiments. Class Path: DLStudio -> TextClassification

TextClassificationWithEmbeddings = <class 'DLStudio.DLStudio.TextClassificationWithEmbeddings'>: The text processing class described previously, TextClassification, was based on using one-hot vectors for representing the words. The main challenge we faced with one-hot vectors was that the larger the size of the training dataset, the larger the size of the vocabulary, and, therefore, the larger the size of the one-hot vectors. The increase in the size of the one-hot vectors led to a model with a significantly larger number of learnable parameters --- and, that, in turn, created a need for a still larger training dataset. Sounds like a classic example of a vicious circle. In this section, I use the idea of word embeddings to break out of this vicious circle. Word embeddings are fixed-sized numerical representations for words that are learned on the basis of the similarity of word contexts. The original and still the most famous of these representations are known as the word2vec embeddings. The embeddings that I use in this section consist of pre-trained 300-element word vectors for 3 million words and phrases as learned from Google News reports. I access these embeddings through the popular Gensim library. Class Path: DLStudio -> TextClassificationWithEmbeddings

Data
		__author__ = 'Avinash Kak (kak@purdue.edu)' __copyright__ = '(C) 2023 Avinash Kak. Python Software Foundation.' __date__ = '2023-April-20' __url__ = 'https://engineering.purdue.edu/kak/distDT/DLStudio-2.2.7.html' __version__ = '2.2.7'

Author
		Avinash Kak (kak@purdue.edu)