ComputationalGraphPrimer-1.1.4.html

ComputationalGraphPrimer (version 1.1.4, 2024-January-28)

ComputationalGraphPrimer.py
Version: 1.1.4 Author: Avinash Kak (kak@purdue.edu) Date: 2024-January-28

`Download Version 1.1.4: gztar`	`Total number of downloads (all versions): 1084` `This count is automatically updated at every rotation of the weblogs (normally once every two to four days) Last updated: Wed Jul 24 06:04:02 EDT 2024`

View the main module code file in your browser CHANGES: Version 1.1.4: This version includes a bug fix in the implementation code for the backprop step in the one-neuron model. The bug was caused by the statement that averaged the partial of the loss over the batch samples to be inside the loop that adds together the contributions from the individual instances in a batch. It needed to be outside the loop. Version 1.1.3: For handcrafted neural networks, the new version provides a correct implementation of the batch-based averaging of the backpropagating gradients as required by stochastic gradient descent. My previous implementation was wrong. Version 1.1.2: Fixes a couple of additional bugs, one dealing with the misalignment of variables and parameters in the backpropagation logic for the one-neuron case, and the other with the example script "verify_with_torchnn.py" not working on account of the previous changes to the module code. Thanks to all who discovered these bugs. Version 1.1.1: Includes a bug fix in the parser for the expressions that define a multi-layer neural network. The bug was causing the input nodes to be replicated when parsing the expressions at each node in the second layer of such a network. Many thanks to the student who discovered the bug. Version 1.1.0: The network visualization code in this version figures out the network layout from the specification of the network in a user's script. Previously, I had hand coded the visualization code for the specific networks in the Examples directory of the distribution. Version 1.0.9: I have added to the in-code documentation to make it easier to understand the implementation code. And wherever appropriate, I have also included pointers to my Week 3 slides for the DL class at Purdue for more detailed explanations of the logic implemented in a code fragment. Additionally, I have incorporated better graph and neural-network display routines in the module. Version 1.0.8: To make the code in the Example scripts easier to understand, the random training data generator now returns the data that you can subsequently supply to the training function. (See the scripts in the Examples directory for what I mean.) I have also provided additional comments in the main class file to help understand the code better. Version 1.0.7: I have fixed some documentation errors in this version and also cleaned up the code by deleting functions not used in the demos. The basic codebase remains the same as in 1.0.6. Version 1.0.6: This version includes a demonstration of how to extend PyTorch's Autograd class if you wish to customize how the learnable parameters are updated during backprop on the basis of the data conditions discovered during the forward propagation. Previously this material was in the DLStudio module. Version 1.0.5: I have been experimenting with different ideas for increasing the tutorial appeal of this module. (That is the reason for the jump in the version number from 1.0.2 to the current 1.0.5.) The previous public version provided a simple demonstration of how one could forward propagate data through a DAG (Directed Acyclic Graph) while at the same compute the partial derivatives that would be needed subsequently during the backpropagation step for updating the values of the learnable parameters. In 1.0.2, my goal was just to illustrate what was meant by a DAG and how to use such a representation for forward data flow and backward parameter update. Since I had not incorporated any nonlinearities in such networks, there was obviously no real learning taking place. That fact was made evident by a plot of training loss versus iterations. To remedy this shortcoming of the previous public-release version, the current version introduces two special cases of networks --- a one-neuron network and a multi-neuron network --- for experimenting with forward propagation of data while calculating the partial derivatives needed later, followed by backpropagation of the prediction errors for updating the values of the learnable parameters. In both cases, I have used the Sigmoid activation function at the nodes. The partial derivatives that are calculated in the forward direction are based on analytical formulas at both the pre-activation point for data aggregation and the post-activation point. The forward and backward calculations incorporate smoothing of the prediction errors and the derivatives over a batch as required by stochastic gradient descent. Version 1.0.2: This version reflects the change in the name of the module that was initially released under the name CompGraphPrimer with version 1.0.1 INTRODUCTION: The advanced level of automation built into the deep learning platforms such as PyTorch comes in the way of a student trying to gain insights into some of the most fundamental aspects of the computing involved. The goal of this Primer module is to take the student back to the roots of how neural networks do their thing. Take, for example, the working of what's at the heart of the PyTorch platform: the Autograd module. As the data forward-propagates through a network of nodes, the module computes the partial derivatives of the output of each layer with respect to the learnable parameters in that layer and with respect to the nodes in the input to the layer. All of this information becomes a part of what Autograd refers to as a Computational Graph. Subsequently, during backpropagation of the loss, the partial derivatives stored in the Computational Graph are used to update the values for the learnable parameters. For a new student of deep learning, there is a lot going on here. The goal of my Primer module is to baby-step through this process with simple handcrafted neural networks so that a student can get a deeper understanding how such computations actually work. More specifically, the goals of this module are: 1) To introduce you to forward and backward dataflows in a Directed Acyclic Graph (DAG). 2) To extend the material developed for the first goal with simple examples of neural networks for demonstrating the forward and backward dataflows for the purpose of updating the learnable parameters. This part of the module also includes a comparison of the performance of such networks with those constructed using torch.nn components. 3) To explain how the behavior of PyTorch's Autograd class can be customized to your specific data needs by extending that class. GOAL 1: The first goal of this Primer is to introduce you to forward and backward dataflows in a general DAG. The acronym DAG stands for Directed Acyclic Graph. Just for the educational value of playing with dataflows in DAGs, this module allows you to create a DAG of variables with a statement like expressions = ['xx=xa^2', 'xy=ab*xx+ac*xa', 'xz=bc*xx+xy', 'xw=cd*xx+xz^3'] where we assume that a symbolic name that beings with the letter 'x' is a variable, all other symbolic names being learnable parameters, and where we use '^' for exponentiation. The four expressions shown above contain five variables --- 'xx', 'xa', 'xy', 'xz', and 'xw' --- and four learnable parameters: 'ab', 'ac', 'bc', and 'cd'. The DAG that is generated by these expressions looks like: ________________________________ / \ / \ / xx=xa**2 v xa --------------> xx -----------------> xy xy = ab*xx + ac*xa | \ | | \ | | \ | | \ | | \_____________ | | | | | V V \ xz \ / xz = bc*xx + xy \ / -----> xw <---- xw = cd*xx + xz By the way, you can call 'display_network2()' on an instance of ComputationalGraphPrimer to make a much better looking plot of the network graph for any DAG created by the sort of expressions shown above. In the DAG shown above, the variable 'xa' is an independent variable since it has no incoming arcs, and 'xw' is an output variable since it has no outgoing arcs. A DAG of the sort shown above is represented in ComputationalGraphPrimer by two dictionaries: 'depends_on' and 'leads_to'. Here is what the 'depends_on' dictionary would look like for the DAG shown above: depends_on['xx'] = ['xa'] depends_on['xy'] = ['xa', 'xx'] depends_on['xz'] = ['xx', 'xy'] depends_on['xw'] = ['xx', 'xz'] Something like "depends_on['xx'] = ['xa']" is best read as "the vertex 'xx' depends on the vertex 'xa'." Similarly, the "depends_on['xz'] = ['xx', 'xy']" is best read aloud as "the vertex 'xz' depends on the vertices 'xx' and 'xy'." And so on. Whereas the 'depends_on' dictionary is a complete description of a DAG, for programming convenience, ComputationalGraphPrimer also maintains another representation for the same graph, as provide by the 'leads_to' dictionary. This dictionary for the same graph as shown above would be: leads_to['xa'] = ['xx', 'xy'] leads_to['xx'] = ['xy', 'xz', 'xw'] leads_to['xy'] = ['xz'] leads_to['xz'] = ['xw'] The "leads_to[xa] = [xx]" is best read as "the outgoing edge at the node 'xa' leads to the node 'xx'." Along the same lines, the "leads_to['xx'] = ['xy', 'xz', 'xw']" is best read as "the outgoing edges at the vertex 'xx' lead to the vertices 'xy', 'xz', and 'xw'. Given a computational graph like the one shown above, we are faced with the following questions: (1) How to propagate the information from the independent nodes --- that we can refer to as the input nodes --- to the output nodes, these being the nodes with only incoming edges? (2) As the information flows in the forward direction, meaning from the input nodes to the output nodes, is it possible to estimate the partial derivatives that apply to each link in the graph? And, finally, (3) Given a scalar value at an output node (which could be the loss estimated at that node), can the partial derivatives estimated during the forward pass be used to backpropagate the loss? Consider, for example, the directed link between the node 'xy' and node 'xz'. As a variable, the value of 'xz' is calculated through the formula "xz = bc*xx + xy". In the forward propagation of information, we estimate the value of 'xz' from currently known values for the learnable parameter 'bc' and the variables 'xx' and 'xy'. In addition to the value of the variable at the node 'xz', we are also interested in the value of the partial derivative of 'xz' with respect to the other variables that it depends on --- 'xx' and 'xy' --- and also with respect to the parameter it depends on, 'bc'. For the calculation of the derivatives, we have a choice: We can either do a bit of computer algebra and figure out that the partial of 'xz' with respect to 'xx' is equal to the current value for 'bc'. Or, we can use the small finite difference method for doing the same, which means that (1) we calculate the value of 'xz' for the current value of 'xx', on the one hand, and, on the other, for 'xx' plus a delta; (2) take the difference of the two; and (3) divide the difference by the delta. ComputationalGraphPrimer module uses the finite differences method for estimating the partial derivatives. Since we have two different types of partial derivatives, partial of a variable with respect to another variable, and the partial of a variable with respect a learnable parameter, ComputationalGraphPrimer uses two different dictionaries for storing this partials during each forward pass. Partials of variables with respect to other variables as encountered during forward propagation are stored in the dictionary "partial_var_to_var" and the partials of the variables with respect to the learnable parameters are stored in the dictionary partial_var_to_param. At the end of each forward pass, the relevant partials extracted from these dictionaries are used to estimate the gradients of the loss with respect to the learnable parameters, as illustrated in the implementation of the method train_on_all_data(). While the exercise mentioned above is good for appreciating data flows in a general DAG, you've got to realize that, with today's algorithms, it would be impossible to carry out any learning in a general DAG. A general DAG with millions of learnable parameters would not lend itself to a fast calculation of the partial derivatives that are needed during the backpropagation step. Since the exercise described above is just to get you thinking about data flows in DAGs and nothing else, I have not bothered to include any activation functions in the DAG demonstration code in this Primer. GOAL 2: That brings us to the second major goal of this Primer module: To provide examples of simple neural structures in which the required partial derivatives are calculated during forward data propagation and subsequently used for parameter update during the backpropagation of loss. In order to become familiar with how this is done in the module, your best place to start would be the following two scripts in the Examples directory of the distribution: one_neuron_classifier.py multi_neuron_classifier.py The first script, "one_neuron_classifier.py", invokes the following function from the module: run_training_loop_one_neuron_model() This function, in turn, calls the following functions, the first for forward propagation of the data, and the second for the backpropagation of loss and updating of the parameters values: forward_prop_one_neuron_model() backprop_and_update_params_one_neuron_model() The data that is forward propagated to the output node is subject to Sigmoid activation. The derivatives that are calculated during forward propagation of the data include the partial 'output vs. input' derivatives for the Sigmoid nonlinearity. The backpropagation step implemented in the second of the two functions listed above includes averaging the partial derivatives and the prediction errors over a batch of training samples, as required by SGD. The second demo script in the Examples directory, "multi_neuron_classifier.py" creates a neural network with a hidden layer and an output layer. Each node in the hidden layer and the node in the output layer are all subject to Sigmoid activation. This script invokes the following function of the module: run_training_loop_multi_neuron_model() And this function, in turn, calls upon the following two functions, the first for forward propagating the data and the second for the backpropagation of loss and updating of the parameters: forward_prop_multi_neuron_model() backprop_and_update_params_multi_neuron_model() In contrast with the one-neuron demo, in this case, the batch-based data that is output by the forward function is sent directly to the backprop function. It then becomes the job of the backprop function to do the averaging needed for SGD. In the Examples directory, you will also find the following script: verify_with_torchnn.py The idea for this script is to serve as a check on the performance of the main demo scripts "one_neuron_classifier.py" and "multi_neuron_classifier.py". Note that you cannot expect the performance of my one-neuron and multi-neuron scripts to match what you would get from similar networks constructed with components drawn from "torch.nn". One primary reason for that is that "torch.nn" based code uses the state-of-the-art optimization of the steps in the parameter hyperplane, with is not the case with my demo scripts. Nonetheless, a comparison with the "torch.nn" is important for general trend of how the training loss varies with the iterations. That is, if the "torch.nn" based script showed decreasing loss (indicated that learning was taking place) while that was not the case with my one-neuron and multi-neuron scripts, that would indicate that perhaps I had made an error in either the computation of the partial derivatives during the forward propagation of the data, or I had used the derivatives for updating the parameters. GOAL 3: The goal here is to show how to extend PyTorch's Autograd class if you want to endow it with additional functionality. Let's say that you wish for some data condition to be remembered during the forward propagation of the data through a network and then use that condition to alter in some manner how the parameters would be updated during backpropagation of the prediction errors. This can be accomplished by subclassing from Autograd and incorporating the desired behavior in the subclass. As to how how you can extend Autograd is demonstrated by the inner class AutogradCustomization in this module. Your starting point for understanding what this class does would be the script extending_autograd.py in the Examples directory of the distribution. INSTALLATION: The ComputationalGraphPrimer class was packaged using setuptools. For installation, execute the following command in the source directory (this is the directory that contains the setup.py file after you have downloaded and uncompressed the package): sudo python3 setup.py install On Linux distributions, this will install the module file at a location that looks like /usr/local/lib/python3.8/dist-packages/ If you do not have root access, you have the option of working directly off the directory in which you downloaded the software by simply placing the following statements at the top of your scripts that use the ComputationalGraphPrimer class: import sys sys.path.append( "pathname_to_ComputationalGraphPrimer_directory" ) To uninstall the module, simply delete the source directory, locate where the ComputationalGraphPrimer module was installed with "locate ComputationalGraphPrimer" and delete those files. As mentioned above, the full pathname to the installed version is likely to look like "/usr/local/lib/python3.8/dist-packages/". If you want to carry out a non-standard install of the ComputationalGraphPrimer module, look up the on-line information on Disutils by pointing your browser to http://docs.python.org/dist/dist.html USAGE: Construct an instance of the ComputationalGraphPrimer class as follows: from ComputationalGraphPrimer import * cgp = ComputationalGraphPrimer( expressions = ['xx=xa^2', 'xy=ab*xx+ac*xa', 'xz=bc*xx+xy', 'xw=cd*xx+xz^3'], output_vars = ['xw'], dataset_size = 10000, learning_rate = 1e-6, grad_delta = 1e-4, display_loss_how_often = 1000, ) cgp.parse_expressions() cgp.display_network2() cgp.gen_gt_dataset(vals_for_learnable_params = {'ab':1.0, 'bc':2.0, 'cd':3.0, 'ac':4.0}) cgp.train_on_all_data() cgp.plot_loss() CONSTRUCTOR PARAMETERS: batch_size: Introduced in Version 1.0.5 for demonstrating forward propagation of the input data while calculating the partial derivatives needed during backpropagation of loss. For SGD, updating the parameters involves smoothing the derivatives over the training samples in a batch. Hence the need for batch_size as a constructor parameter. dataset_size: Although the networks created by an arbitrary set of expressions are not likely to allow for any true learning of the parameters, nonetheless the ComputationalGraphPrimer allows for the computation of the loss at the output nodes and backpropagation of the loss to the other nodes. To demonstrate this, we need a ground-truth set of input/output values for given value for the learnable parameters. The constructor parameter 'dataset_size' refers to how may of these 'input/output' pairs would be generated for such experiments. For the one-neuron and multi-neuron demos introduced in Version 1.0.5, the constructor parameter dataset_size refers to many tuples of randomly generated data should be made available for learning. The size of each data tuple is deduced from the the first expression in the list made available to module through the parameter 'expressions' described below. display_loss_how_often: This controls how often you will see the result of the calculations being carried out in the computational graph. Let's say you are experimenting with 10,000 input/output samples for propagation in the network, if you set this constructor option to 1000, you will see the partial derivatives and the values for the learnable parameters every 1000 passes through the graph. expressions: These expressions define the computational graph. The expressions are based on the following assumptions: (1) any variable name must start with the letter 'x'; (2) a symbolic name that does not start with 'x' is a learnable parameter; (3) exponentiation operator is '^'; (4) the symbols '*', '+', and '-' carry their usual arithmetic meanings. grad_delta: This constructor option sets the value of the delta to be used for estimating the partial derivatives with the finite difference method. layers_config: Introduced in Version 1.0.5 for the multi-neuron demo. Its value is a list of nodes in each layer of the network. Note that I consider the input to the neural network as a layer unto itself. Therefore, if the value of the parameter num_layers is 3, the list you supply for layers_config must have three numbers in it. learning_rate: Carries the usual meaning for updating the values of the learnable parameters based on the gradients of the output of a layer with respect to those parameters. num_layers: Introduced in Version 1.0.5 for the multi-neuron demo. It is merely a convenience parameter that indicated the number of layers in your multi-neuron network. For the purpose of counting layers, I consider the input as a layer unto itself. one_neuron_model: Introduced in Version 1.0.5. This boolean parameter is needed only when you are constructing a one-neuron demo. I needed this constructor parameter for some conditional evaluations in the "parse_expressions()" method of the module. I use that expression parser for both the older demos and the new demo based on the one-neuron model. output_vars: Although the parser has the ability to figure out which nodes in the computational graph represent the output variables --- these being nodes with no outgoing arcs --- you are allowed to designate the specific output variables you are interested in through this constructor parameter. training_iterations: Carries the expected meaning. PUBLIC METHODS: (1) backprop_and_update_params_one_neuron_model(): Introduced in Version 1.0.5. This method is called by run_training_loop_one_neuron_model() for backpropagating the loss and updating the values of the learnable parameters. (2) backprop_and_update_params_multi_neuron_model(): Introduced in Version 1.0.5. This method is called by run_training_loop_multi_neuron_model() for backpropagating the loss and updating the values of the learnable parameters. (3) display_network2(): This method calls on the networkx module to construct a visual display of the computational graph. (4) forward_propagate_one_input_sample_with_partial_deriv_calc(): This method is used for pushing the input data forward through a general DAG and at the same computing the partial derivatives that would be needed during backpropagation for updating the values of the learnable parameters. (5) forward_prop_one_neuron_model(): Introduced in Version 1.0.5. This function propagates the input data through a one-neuron network. The data aggregated at the neuron is subject to a Sigmoid activation. The function also calculates the partial derivatives needed during backprop. (6) forward_prop_multi_neuron_model(): Introduced in Version 1.0.5. This function does the same thing as the previous function, except that it is intended for a multi-layer neural network. The pre-activation values at each neuron are subject to the Sigmoid nonlinearity. At the same time, the partial derivatives are calculated and stored away for use during backprop. (7) gen_gt_dataset() This method generates the training data for a general graph of nodes in a DAG. For random values at the input nodes, it calculates the values at the output nodes assuming certain given values for the learnable parameters in the network. If it were possible to carry out learning in such a network, the goal would to see if the value of those parameters would be learned automatically as in a neural network. (8) gen_training_data(): Introduced in Version 1.0.5. This function generates training data for the scripts "one_neuron_classifier.py", "multi_neuron_classifier.py" and "verify_with_torchnn.py" scripts in the Examples directory of the distribution. The data corresponds to two classes defined by two different multi-variate distributions. The dimensionality of the data is determined entirely the how many nodes are found by the expression parser in the list of expressions that define the network. (9) parse_expressions() This method parses the expressions provided and constructs a DAG from them for the variables and the parameters in the expressions. It is based on the convention that the names of all variables begin with the character 'x', with all other symbolic names being treated as learnable parameters. (10) parse_multi_layer_expressions(): Introduced in Version 1.0.5. Whereas the previous method, parse_expressions(), works well for creating a general DAG and for the one-neuron model, it is not meant to capture the layer based structure of a neural network. Hence this method. (11) run_training_loop_one_neuron_model(): Introduced in Version 1.0.5. This is the main function in the module for the demo based on the one-neuron model. The demo consists of propagating the input values forward, aggregating them at the neuron, and subjecting the result to Sigmoid activation. All the partial derivatives needed for updating the link weights are calculating the forward propagation. This includes the derivatives of the output vis-a-vis the input at the Sigmoid activation. Subsequently, during backpropagation of the loss, the parameter values are updated using the derivatives stored away during forward propagation. (12) run_training_loop_multi_neuron_model() Introduced in Version 1.0.5. This is the main function for the demo based on a multi-layer neural network. As each batch of training data is pushed through the network, the partial derivatives of the output at each layer is computed with respect to the parameters. This calculating includes computing the partial derivatives at the output of the activation function with respect to its input. Subsequently, during backpropagation, first batch-based smoothing is applied to the derivatives and the prediction errors stored away during forward propagation in order to comply with the needs of SGD and the values of the learnable parameters updated. (13) run_training_with_torchnn(): Introduced in Version 1.0.5. The purpose of this function is to use comparable network components from the torch.nn module in order to "authenticate" the performance of the handcrafted one-neuron and the multi-neuron models in this module. All that is meant by "authentication" here is that if the torch.nn based networks show the training loss decrease with iterations, you would the one-neuron and the multi-neuron models to show similar results. This function contains the following inner classes: class OneNeuronNet( torch.nn.Module ) class MultiNeuronNet( torch.nn.Module ) that define networks similar to the handcrafted one-neuron and multi-neuron networks of this module. (14) train_on_all_data() The purpose of this function is to call forward propagation and backpropagation functions of the module for the demo based on arbitrary DAGs. (15) plot_loss() This is only used by the functions that DAG based demonstration code in the module. The training functions introduced in Version 1.0.5 have embedded code for plotting the loss as a function of iterations. THE Examples DIRECTORY: The Examples directory of the distribution contains the following the following scripts: 1. graph_based_dataflow.py This demonstrates forward propagation of input data and backpropagation in a general DAG (Directed Acyclic Graph). The forward propagation involves estimating the partial derivatives that would subsequently be used for "updating" the learnable parameters during backpropagation. Since I have not incorporated any activations in the DAG, you can really not expect any real learning to take place in this demo. The purpose of this demo is just to illustrate what is meant by a DAG and how information can flow forwards and backwards in such a network. 2. one_neuron_classifier.py This script demonstrates the one-neuron model in the module. The goal is to show forward propagation of data through the neuron (which includes the Sigmoid activation), while calculating the partial derivatives needed during the backpropagation step for updating the parameters. 3. multi_neuron_classifier.py This script generalizes what is demonstrated by the one-neuron model to a multi-layer neural network. This script demonstrates saving the partial-derivative information calculated during the forward propagation through a multi-layer neural network and using that information for backpropagating the loss and for updating the values of the learnable parameters. 4. verify_with_torchnn.py The purpose of this script is just to verify that the results obtained with the scripts "one_neuron_classifier.py" and "multi_neuron_classifier.py" are along the expected lines. That is, if similar networks constructed with the torch.nn module show the training loss decreasing with iterations, you would expect the similar learning behavior from the scripts "one_neuron_classifier.py" and "multi_neuron_classifier.py". 5. extending_autograd.py This provides a demo example of the recommended approach for giving additional functionality to Autograd. See the explanation in the doc section associated with the inner class AutogradCustomization of this module for further info. BUGS: Please notify the author if you encounter any bugs. When sending email, please place the string 'Computational Graph Primer' in the subject line to get past the author's spam filter. ACKNOWLEDGMENTS: Akshita Kamsali's help with improving the quality of the network graph visualization code is much appreciated. Akshita is working on her Ph.D. in Robot Vision Lab at Purdue. Version 1.1.3 was prompted by Moiz Rasheed discovering that, for the handcrafted neural networks in CGP, my previous implementation of the SGD-mandated averaging of the backpropagating gradients of loss was incorrect. Moiz Rasheed was a student in my Deep Learning class at Purdue during Spring 2023. Thanks, Moiz! For version 1.1.4, I owe many thanks to Karl Weisenburger for discovering the error in batch averaging for the one-neuron model. At the time of this release in Spring 2024, Karl was a student in my Deep Learning class at Purdue. ABOUT THE AUTHOR: The author, Avinash Kak, is a professor of Electrical and Computer Engineering at Purdue University. For all issues related to this module, contact the author at kak@purdue.edu If you send email, please place the string "ComputationalGraphPrimer" in your subject line to get past the author's spam filter. COPYRIGHT: Python Software Foundation License Copyright 2024 Avinash Kak @endofdocs

Imported Modules

copy
itertools
math

numpy
networkx
operator

os
matplotlib.pyplot
random

re
sys
torch

Classes

builtins.object

ComputationalGraphPrimer
Exp

class ComputationalGraphPrimer(builtins.object)

ComputationalGraphPrimer(*args, **kwargs)

Methods defined here:

__init__(self, *args, **kwargs): Initialize self. See help(type(self)) for accurate signature.

backprop_and_update_params_multi_neuron_model(self, predictions, y_errors): First note that loop index variable 'back_layer_index' starts with the index of the last layer. For the 3-layer example shown for 'forward', back_layer_index starts with a value of 2, its next value is 1, and that's it. In the code below, the outermost loop is over the data samples in a batch. As shown on Slide 73 of my Week 3 lecture, in order to calculate the partials of Loss with respect to the learnable params, we need to backprop the prediction errors and the gradients of the Sigmoid. For the purpose of satisfying the requirements of SGD, the backprop of the prediction errors and the gradients needs to be carried out separately for each training data sample in a batch. That's what the outer loop is for. After we exit the outermost loop, we average over the results obtained from each training data sample in a batch. Pay attention to the variable 'vars_in_layer'. These store the node variables in the current layer during backpropagation.

backprop_and_update_params_one_neuron_model(self, data_tuples_in_batch, predictions, y_errors_in_batch, deriv_sigmoids): This function implements the equations shown on Slide 61 of my Week 3 presentation in our DL class at Purdue. All four parameters defined above are lists of what was either supplied to the forward prop function or calculated by it for each training data sample in a batch.

calculate_loss(self, predicted_val, true_val): ###################################################################################################### ###################################### Utility Functions ############################################

display_DAG(self): The network visualization code in this script should work for any general DAG defined in an instance of CGP. For an example, see the script graph_based_dataflow.py in the Examples directory of the module.

display_multi_neuron_network(self): In version 1.1.0, I made this network visualization more general and (if it has no bugs) it should work with any multi-layer network graph, such as the one shown in multi_neuron_classifier.py in the Examples directory of the module.

display_network1(self)

display_network2(self): Provides a fancier display of the network graph

display_one_neuron_network(self): In version 1.1.0, I generalized this code to work on any one-neuron network as defined in the example: one_neuron_classifier.py in the Examples directory of the module.

eval_expression(self, exp, vals_for_vars, vals_for_learnable_params, ind_vars=None)

forward_prop_multi_neuron_model(self, data_tuples_in_batch): During forward propagation, we push each batch of the input data through the network. In order to explain the logic of forward, consider the following network layout in 4 nodes in the input layer, 2 nodes in the hidden layer, and 1 node in the output layer. input x x = node x x| | = sigmoid activation x| x x| x layer_0 layer_1 layer_2 In the code shown below, the expressions to evaluate for computing the pre-activation values at a node are stored at the layer in which the nodes reside. That is, the dictionary look-up "self.layer_exp_objects[layer_index]" returns the Expression objects for which the left-side dependent variable is in the layer pointed to layer_index. So the example shown above, "self.layer_exp_objects[1]" will return two Expression objects, one for each of the two nodes in the second layer of the network (that is, layer indexed 1). The pre-activation values obtained by evaluating the expressions at each node are then subject to Sigmoid activation, followed by the calculation of the partial derivative of the output of the Sigmoid function with respect to its input. In the forward, the values calculated for the nodes in each layer are stored in the dictionary self.forw_prop_vals_at_layers[ layer_index ] and the gradients values calculated at the same nodes in the dictionary: self.gradient_vals_for_layers[ layer_index ]

forward_prop_one_neuron_model(self, data_tuples_in_batch): Forward propagates the batch data through the neural network according to the equations on Slide 50 of my Week 3 slides. As the one-neuron model is characterized by a single expression, the main job of this function is to evaluate that expression for each data tuple in the incoming batch. The resulting output is fed into the sigmoid activation function and the partial derivative of the sigmoid with respect to its input calculated.

forward_propagate_one_input_sample_with_partial_deriv_calc(self, sample_index, input_vals_for_ind_vars): If you want to look at how the information flows in the DAG when you don't have to worry about estimating the partial derivatives, see the method gen_gt_dataset(). As you will notice in the implementation code for that method, there is nothing much to pushing the input values through the nodes and the arcs of a computational graph if we are not concerned about estimating the partial derivatives. On the other hand, if you want to see how one might also estimate the partial derivatives as during the forward flow of information in a computational graph, the forward_propagate...() presented here is the method to examine. We first split the expression that the node variable depends on into its constituent parts on the basis of '+' and '-' operators and subsequently, for each part, we estimate the partial of the node variable with respect to the variables and the learnable parameters in that part. The needed partial derivatives are all calculated using the finite difference method in which you add a small grad_delta value to the value of the variable with respect to which you are calculating the partial and you then estimate the resulting change at the node in question. The change divided by grad_delta is the partial derivative you are looking for.

gen_gt_dataset(self, vals_for_learnable_params={}): This method illustrates that it is trivial to forward-propagate the information through the computational graph if you are not concerned about estimating the partial derivatives at the same time. This method is used to generate 'dataset_size' number of input/output values for the computational graph for given values for the learnable parameters.

gen_gt_dataset_with_activations(self, vals_for_learnable_params={}): This method illustrates that it is trivial to forward-propagate the information through the computational graph if you are not concerned about estimating the partial derivatives at the same time. This method is used to generate 'dataset_size' number of input/output values for the computational graph for given values for the learnable parameters.

gen_training_data(self): This 2-class dataset is used for the demos in the following Examples directory scripts: one_neuron_classifier.py multi_neuron_classifier.py multi_neuron_classifier.py The classes are labeled 0 and 1. All of the data for class 0 is simply a list of numbers associated with the key 0. Similarly all the data for class 1 is another list of numbers associated with the key 1. For each class, the dataset starts out as being standard normal (zero mean and unit variance) to which we add a mean value of 2.0 for class 0 and we add mean value of 4 to the square of the original numbers for class 1.

parse_expressions(self): This method creates a DAG from a set of expressions that involve variables and learnable parameters. The expressions are based on the assumption that a symbolic name that starts with the letter 'x' is a variable, with all other symbolic names being learnable parameters. The computational graph is represented by two dictionaries, 'depends_on' and 'leads_to'. To illustrate the meaning of the dictionaries, something like "depends_on['xz']" would be set to a list of all other variables whose outgoing arcs end in the node 'xz'. So something like "depends_on['xz']" is best read as "node 'xz' depends on ...." where the dots stand for the array of nodes that is the value of "depends_on['xz']". On the other hand, the 'leads_to' dictionary has the opposite meaning. That is, something like "leads_to['xz']" is set to the array of nodes at the ends of all the arcs that emanate from 'xz'.

parse_general_dag_expressions(self): This method is a modification of the previous expression parser and meant specifically for the case when a given set of expressions are supposed to define a general DAG. The naming conventions for the variables, which designate the nodes in the layers of the network, and the learnable parameters remain the same as in the previous function.

parse_multi_layer_expressions(self): This method is a modification of the previous expression parser and meant specifically for the case when a given set of expressions are supposed to define a multi-layer neural network. The naming conventions for the variables, which designate the nodes in the layers of the network, and the learnable parameters remain the same as in the previous function.

plot_loss(self)

run_training_loop_multi_neuron_model(self, training_data): ###################################################################################################### ######################################## multi neuron model ##########################################

run_training_loop_one_neuron_model(self, training_data): The training loop must first initialize the learnable parameters. Remember, these are the symbolic names in your input expressions for the neural layer that do not begin with the letter 'x'. In this case, we are initializing with random numbers from a uniform distribution over the interval (0,1).

run_training_with_torchnn(self, option, training_data): The value of the parameter 'option' must be either 'one_neuron' or 'multi_neuron'. For either option, the number of input nodes is specified by the expressions specified in the constructor of the class ComputationalGraphPrimer. When the option value is 'one_neuron', we use the OneNeuronNet for the learning network and when the option is 'multi_neuron' we use the MultiNeuronNet. Assuming that the number of input nodes specified by the expressions is 4, the MultiNeuronNet class creates the following network layout in which we have 2 nodes in the hidden layer and one node for the final output: input x x = node x x| | = ReLU activation x| x x| x layer_0 layer_1 layer_2

train_on_all_data(self): The purpose of this method is to call forward_propagate_one_input_sample_with_partial_deriv_calc() repeatedly on all input/output ground-truth training data pairs generated by the method gen_gt_dataset(). The call to the forward_propagate...() method returns the predicted value at the output nodes from the supplied values at the input nodes. The "train_on_all_data()" method calculates the error associated with the predicted value. The call to forward_propagate...() also returns the partial derivatives estimated by using the finite difference method in the computational graph. Using the partial derivatives, the "train_on_all_data()" backpropagates the loss to the interior nodes in the computational graph and updates the values for the learnable parameters.

Data descriptors defined here:

__dict__: dictionary for instance variables (if defined)

__weakref__: list of weak references to the object (if defined)

Data and other attributes defined here:

AutogradCustomization = <class 'ComputationalGraphPrimer.ComputationalGraphPrimer.AutogradCustomization'>: This class illustrates how you can add additional functionality of Autograd by following the instructions posted at https://pytorch.org/docs/stable/notes/extending.html

class Exp(builtins.object)

Exp(exp, body, dependent_var, right_vars, right_params) With CGP, you can handcraft a neural network (actually you can handcraft any DAG) by designating the nodes and the links between them with expressions like expressions = [ 'xx=xa^2', 'xy=ab*xx+ac*xa', 'xz=bc*xx+xy', 'xw=cd*xx+xz^3' ] In these expressions, names beginning with 'x' denote the nodes in the DAG, and the names beginning with lowercase letters like 'a', 'b', 'c', etc., designate the learnable parameters. The variable on the left of the '=' symbol is considered to be the dependent_var and those on the right are, as you guessed, the right_vars. Since the learnable parameters are always on the right of the equality sign, we refer to them as right_params in what is shown below. The expressions shown above are parsed by the parser function in CGP. The parser outputs an instance of the Exp class for each expression of the sort shown above. What is shown above has 4 expressions for creating a DAG. Of course, you can have any number of them.

Methods defined here:

__init__(self, exp, body, dependent_var, right_vars, right_params): Initialize self. See help(type(self)) for accurate signature.

Data descriptors defined here:

__dict__: dictionary for instance variables (if defined)

__weakref__: list of weak references to the object (if defined)


		__author__ = 'Avinash Kak (kak@purdue.edu)' __copyright__ = '(C) 2024 Avinash Kak. Python Software Foundation.' __date__ = '2024-January-28' __url__ = 'https://engineering.purdue.edu/kak/distCGP/ComputationalGraphPrimer-1.1.4.html' __version__ = '1.1.4'

Author
		Avinash Kak (kak@purdue.edu)