ComputationalGraphPrimer-1.0.2.html

ComputationalGraphPrimer (version 1.0.2, 2020-January-12)

ComputationalGraphPrimer.py
Version: 1.0.2 Author: Avinash Kak (kak@purdue.edu) Date: 2020-January-12

`Download Version 1.0.2: gztar`	`Total number of downloads (all versions): 2000` `This count is automatically updated at every rotation of the weblogs (normally once every two to four days) Last updated: Sat Feb 14 06:02:01 EST 2026`

View the main module code file in your browser Switch to Version 1.0.5 CHANGES: Version 1.0.2: This version reflects the change in the name of the module that was initially released under the name CompGraphPrimer with version 1.0.1 INTRODUCTION: This module was created with a modest goal in mind: its purpose is merely to serve as a prelude to discussing automatic calculation of the loss gradients in modern Python based platforms for deep learning. Most students taking classes on deep learning focus on just using the tools provided by platforms such as PyTorch without any understanding of how the tools really work. Consider, for example, Autograd --- a module that is at the heart of PyTorch --- for automatic differentiation of tensors. With no effort on the part of the programmer, and through the functionality built into the torch.Tensor class, the Autograd module keeps track of a tensor through all calculations involving the tensor and computes its partial derivatives with respect to the other entities involved in the calculations. These derivatives are subsequently used to estimate the gradient of the loss with respect to the learnable parameters and for backpropagating the loss. Now imagine a beginning student trying to make sense of the following excerpts from the documentation related to Autograd: "Every operation performed on Tensors creates a new function object, that performs the computation, and records that it happened. The history is retained in the form of a DAG of functions, with edges denoting data dependencies (input <- output). Then, when backward is called, the graph is processed in the topological ordering, by calling backward() methods of each Function object, and passing returned gradients on to next Functions." and "Check gradients computed via small finite differences against analytical gradients w.r.t. tensors in inputs that are of floating point type and with requires_grad=True." There is a lot going on here: Why do we need to record the history of the operations carried out on a tensor? What is a DAG? What are the returned gradients? Gradients of what? What are the small finite differences? Analytical gradients of what? etc. etc. The goal of the ComputationalGraphPrimer is serve as a first step to understanding the concepts involved in the questions listed above. This module allows you to create a DAG (Directed Acyclic Graph) of variables with a statement like expressions = ['xx=xa^2', 'xy=ab*xx+ac*xa', 'xz=bc*xx+xy', 'xw=cd*xx+xz^3'] where we assume that a symbolic name that beings with the letter 'x' is a variable, all other symbolic names being learnable parameters, and where we use '^' for exponentiation. The four expressions shown above contain five variables --- 'xx', 'xa', 'xy', 'xz', and 'xw' --- and four learnable parameters: 'ab', 'ac', 'bc', and 'cd'. The DAG that is generated by these expressions looks like: ________________________________ / \ / \ / xx=xa**2 v xa --------------> xx -----------------> xy xy = ab*xx + ac*xa | \ | | \ | | \ | | \ | | \_____________ | | | | | V V \ xz \ / xz = bc*xx + xy \ / -----> xw <---- xw = cd*xx + xz By the way, you can call 'display_network2()' on an instance of ComputationalGraphPrimer to make a much better looking plot of the network for any DAG created by the sort of expressions shown above. For the educational example shown above, the nodes in the DAG correspond to the variables. For a more sophisticated DAG, each node would correspond to each operation between the tensors. In the DAG shown above, the variable 'xa' is an independent variable since it has no incoming arcs, and 'xw' is an output variable since it has no outgoing arcs. A DAG of the sort shown above is represented in ComputationalGraphPrimer by two dictionaries: 'depends_on' and 'leads_to'. Here is what the 'depends_on' dictionary would look like for the DAG shown above: depends_on['xx'] = ['xa'] depends_on['xy'] = ['xa', 'xx'] depends_on['xz'] = ['xx', 'xy'] depends_on['xw'] = ['xx', 'xz'] Something like "depends_on['xx'] = ['xa']" is best read as "the vertex 'xx' depends on the vertex 'xa'." Similarly, the "depends_on['xz'] = ['xx', 'xy']" is best read aloud as "the vertex 'xz' depends on the vertices 'xx' and 'xy'." And so on. Whereas the 'depends_on' dictionary is a complete description of a DAG, for programming convenience, ComputationalGraphPrimer also maintains another representation for the same graph, as provide by the 'leads_to' dictionary. This dictionary for the same graph as shown above would be: leads_to['xa'] = ['xx', 'xy'] leads_to['xx'] = ['xy', 'xz', 'xw'] leads_to['xy'] = ['xz'] leads_to['xz'] = ['xw'] The "leads_to[xa] = [xx]" is best read as "the outgoing edge at the node 'xa' leads to the node 'xx'." Along the same lines, the "leads_to['xx'] = ['xy', 'xz', 'xw']" is best read as "the outgoing edges at the vertex 'xx' lead to the vertices 'xy', 'xz', and 'xw'. Given a computational graph like the one shown above, we are faced with the following questions: (1) How to propagate the information from the independent nodes --- that we can refer to as the input nodes --- to the output nodes, these being the nodes with only incoming edges? (2) As the information flows in the forward direction, meaning from the input nodes to the output nodes, is it possible to estimate the partial derivatives that apply to each link in the graph? And, finally, (3) Given a scalar value at an output node (which could be the loss estimated at that node), can the partial derivatives estimated during the forward pass be used to backpropagate the loss? Consider, for example, the directed link between the node 'xy' and node 'xz'. As a variable, the value of 'xz' is calculated through the formula "xz = bc*xx + xy". In the forward propagation of information, we estimate the value of 'xz' from currently known values for the learnable parameter 'bc' and the variables 'xx' and 'xy'. In addition to the value of the variable at the node 'xz', we are also interested in the value of the partial derivative of 'xz' with respect to the other variables that it depends on --- 'xx' and 'xy' --- and also with respect to the parameter it depends on, 'bc'. For the calculation of the derivatives, we have a choice: We can either do a bit of computer algebra and figure out that the partial of 'xz' with respect to 'xx' is equal to the current value for 'bc'. Or, we can use the small finite difference method for doing the same, which means that (1) we calculate the value of 'xz' for the current value of 'xx', on the one hand, and, on the other, for 'xx' plus a delta; (2) take the difference of the two; and (3) divide the difference by the delta. ComputationalGraphPrimer module uses the finite differences method for estimating the partial derivatives. Since we have two different types of partial derivatives, partial of a variable with respect to another variable, and the partial of a variable with respect a learnable parameter, ComputationalGraphPrimer uses to different dictionaries for storing this partials during each forward pass. Partials of variables with respect to other variables as encountered during forward propagation are stored in the dictionary "partial_var_to_var" and the partials of the variables with respect to the learnable parameters are stored in the dictionary partial_var_to_param. At the end of each forward pass, the relevant partials extracted from these dictionaries are used to estimate the gradients of the loss with respect to the learnable parameters, as illustrated in the implementation of the method train_on_all_data(). INSTALLATION: The ComputationalGraphPrimer class was packaged using setuptools. For installation, execute the following command in the source directory (this is the directory that contains the setup.py file after you have downloaded and uncompressed the package): sudo python setup.py install and/or, for the case of Python3, sudo python3 setup.py install On Linux distributions, this will install the module file at a location that looks like /usr/local/lib/python2.7/dist-packages/ and, for the case of Python3, at a location that looks like /usr/local/lib/python3.6/dist-packages/ If you do not have root access, you have the option of working directly off the directory in which you downloaded the software by simply placing the following statements at the top of your scripts that use the ComputationalGraphPrimer class: import sys sys.path.append( "pathname_to_ComputationalGraphPrimer_directory" ) To uninstall the module, simply delete the source directory, locate where the ComputationalGraphPrimer module was installed with "locate ComputationalGraphPrimer" and delete those files. As mentioned above, the full pathname to the installed version is likely to look like /usr/local/lib/python2.7/dist-packages/ComputationalGraphPrimer* If you want to carry out a non-standard install of the ComputationalGraphPrimer module, look up the on-line information on Disutils by pointing your browser to http://docs.python.org/dist/dist.html USAGE: Construct an instance of the ComputationalGraphPrimer class as follows: from ComputationalGraphPrimer import * cgp = ComputationalGraphPrimer( expressions = ['xx=xa^2', 'xy=ab*xx+ac*xa', 'xz=bc*xx+xy', 'xw=cd*xx+xz^3'], output_vars = ['xw'], dataset_size = 10000, learning_rate = 1e-6, grad_delta = 1e-4, display_vals_how_often = 1000, ) cgp.parse_expressions() cgp.display_network2() cgp.gen_gt_dataset(vals_for_learnable_params = {'ab':1.0, 'bc':2.0, 'cd':3.0, 'ac':4.0}) cgp.train_on_all_data() cgp.plot_loss() CONSTRUCTOR PARAMETERS: expressions: These expressions define the computational graph. The expressions are based on the following assumptions: (1) any variable name must start with the letter 'x'; (2) a symbolic name that does not start with 'x' is a learnable parameter; (3) exponentiation operator is '^'; (4) the symbols '*', '+', and '-' carry their usual arithmetic meanings. output_vars: Although the parser has the ability to figure out which nodes in the computational graph represent the output variables --- these being nodes with no outgoing arcs --- you are allowed to designate the specific output variables you are interested in through this constructor parameter. dataset_size: Although the networks created by an arbitrary set of expressions are not likely to allow for any true learning of the parameters, nonetheless the ComputationalGraphPrimer allows for the computation of the loss at the output nodes and backpropagation of the loss to the other nodes. To demonstrate this, we need a ground-truth set of input/output values for given value for the learnable parameters. The constructor parameter 'dataset_size' refers to how may of these 'input/output' pairs would be generated for such experiments. learning_rate: Carries the usual meaning for updating the values of the learnable parameters based on the gradients of the loss with respect to those parameters. grad_delta: This constructor option sets the value of the delta to be used for estimating the partial derivatives with the finite difference method. display_vals_how_often: This controls how often you will see the result of the calculations being carried out in the computational graph. Let's say you are experimenting with 10,000 input/output samples for propagation in the network, if you set this constructor option to 1000, you will see the partial derivatives and the values for the learnable parameters every 1000 passes through the graph. PUBLIC METHODS: (1) parse_expressions() This method parses the expressions provided and constructs a DAG from them for the variables and the parameters in the expressions. It is based on the convention that the names of all variables begin with the character 'x', with all other symbolic names being treated as learnable parameters. (2) display_network2() This method calls on the networkx module to construct a visual of the computational graph. (3) gen_gt_dataset() This method illustrates that it is trivial to forward-propagate the information through the computational graph if you are not concerned about estimating the partial derivatives at the same time. This method is used to generate 'dataset_size' number of input/output values for the computational graph for given values for the learnable parameters. (4) train_on_all_data() The purpose of this method is to call forward_propagate_one_input_sample_with_partial_deriv_calc() repeatedly on all input/output ground-truth training data pairs generated by the method gen_gt_dataset(). The call to the forward_propagate...() method returns the predicted value at the output nodes from the supplied values at the input nodes. The "train_on_all_data()" method calculates the error associated with the predicted value. The call to forward_propagate...() also returns the partial derivatives estimated by using the finite difference method in the computational graph. Using the partial derivatives, the "train_on_all_data()" backpropagates the loss to the interior nodes in the computational graph and updates the values for the learnable parameters. (5) forward_propagate_one_input_sample_with_partial_deriv_calc() If you want to look at how the information flows in the DAG when you don't have to worry about estimating the partial derivatives, see the method gen_gt_dataset(). As you will notice in the implementation code for that method, there is nothing much to pushing the input values through the nodes and the arcs of a computational graph if we are not concerned about estimating the partial derivatives. On the other hand, if you want to see how one might also estimate the partial derivatives as during the forward flow of information in a computational graph, the forward_propagate...() presented here is the method to examine. We first split the expression that the node variable depends on into its constituent parts on the basis of '+' and '-' operators and subsequently, for each part, we estimate the partial of the node variable with respect to the variables and the learnable parameters in that part. (6) plot_loss() Constructs a plot of losses during each iteration of the forward pass of the data. Do not forget that you are highly unlikely to see a loss plot as you would for a neural network. Arbitrary computational graphs, as used for illustrating concepts in this module, are not likely to possess learning capability. THE Examples DIRECTORY: The Examples subdirectory in the distribution contains a script named "demo.py" that illustrates how you can use this module. In order to illustrate in a classroom setting the various concepts that this module is meant for, you may need to insert print statements in the various module functions. BUGS: Please notify the author if you encounter any bugs. When sending email, please place the string 'ComputationalGraphPrimer' in the subject line to get past the author's spam filter. ABOUT THE AUTHOR: The author, Avinash Kak, is a professor of Electrical and Computer Engineering at Purdue University. For all issues related to this module, contact the author at kak@purdue.edu If you send email, please place the string "ComputationalGraphPrimer" in your subject line to get past the author's spam filter. COPYRIGHT: Python Software Foundation License Copyright 2020 Avinash Kak @endofdocs

Modules

copy
math
numpy

networkx
os
matplotlib.pyplot

random
re
sys

Classes

__builtin__.object

ComputationalGraphPrimer

class ComputationalGraphPrimer(__builtin__.object)

Methods defined here:

__init__(self, *args, **kwargs)

calculate_loss(self, predicted_val, true_val)

display_network1(self)

display_network2(self): Provides a fancier display of the network graph

forward_propagate_one_input_sample_with_partial_deriv_calc(self, sample_index, input_vals_for_ind_vars): If you want to look at how the information flows in the DAG when you don't have to worry about estimating the partial derivatives, see the method gen_gt_dataset(). As you will notice in the implementation code for that method, there is nothing much to pushing the input values through the nodes and the arcs of a computational graph if we are not concerned about estimating the partial derivatives. On the other hand, if you want to see how one might also estimate the partial derivatives as during the forward flow of information in a computational graph, the forward_propagate...() presented here is the method to examine. We first split the expression that the node variable depends on into its constituent parts on the basis of '+' and '-' operators and subsequently, for each part, we estimate the partial of the node variable with respect to the variables and the learnable parameters in that part.

gen_gt_dataset(self, vals_for_learnable_params={}): This method illustrates that it is trivial to forward-propagate the information through the computational graph if you are not concerned about estimating the partial derivatives at the same time. This method is used to generate 'dataset_size' number of input/output values for the computational graph for given values for the learnable parameters.

parse_expressions(self): This method creates a DAG from a set of expressions that involve variables and learnable parameters. The expressions are based on the assumption that a symbolic name that starts with the letter 'x' is a variable, with all other symbolic names being learnable parameters. The computational graph is represented by two dictionaries, 'depends_on' and 'leads_to'. To illustrate the meaning of the dictionaries, something like "depends_on['xz']" would be set to a list of all other variables whose outgoing arcs end in the node 'xz'. So something like "depends_on['xz']" is best read as "node 'xz' depends on ...." where the dots stand for the array of nodes that is the value of "depends_on['xz']". On the other hand, the 'leads_to' dictionary has the opposite meaning. That is, something like "leads_to['xz']" is set to the array of nodes at the ends of all the arcs that emanate from 'xz'.

plot_loss(self)

train_on_all_data(self): The purpose of this method is to call forward_propagate_one_input_sample_with_partial_deriv_calc() repeatedly on all input/output ground-truth training data pairs generated by the method gen_gt_dataset(). The call to the forward_propagate...() method returns the predicted value at the output nodes from the supplied values at the input nodes. The "train_on_all_data()" method calculates the error associated with the predicted value. The call to forward_propagate...() also returns the partial derivatives estimated by using the finite difference method in the computational graph. Using the partial derivatives, the "train_on_all_data()" backpropagates the loss to the interior nodes in the computational graph and updates the values for the learnable parameters.

Data descriptors defined here:

__dict__: dictionary for instance variables (if defined)

__weakref__: list of weak references to the object (if defined)

p
		__author__ = 'Avinash Kak (kak@purdue.edu)' __copyright__ = '(C) 2020 Avinash Kak. Python Software Foundation.' __date__ = '2020-January-12' __url__ = 'https://engineering.purdue.edu/kak/distCGP/ComputationalGraphPrimer-1.0.2.html' __version__ = '1.0.2'

Author
		Avinash Kak (kak@purdue.edu)