B.2 Probability and Statistics
The exponential family and Generalized Linear Models (GLiMs)
Change of variables in probability densities
The score function
The score is defined as the gradient of the log-likelihood (with respect to the parameters, ), . The mean of the score is zero:
The variance of the score is known as the Fisher information. Because its mean is zero, it is also the expected square of the score.
The Fisher information for exponential-family random variables
This turns out to take a simple form. For a (vector) random variable and “parameters” (that may themselves be random variables):
the Fisher information is:
where in the last line we have used the fact that the derivatives of the log-normalizer are the cumulants of the sufficient statistics () under the distribution. A perhaps more interesting equivalent can be derived by noting that:
Therefore,
Markov chains
Discrete random variables
[[[table]]]
Useful identities
Expectations of quadratic forms.
Consider a vector random variable with mean and covariance . We are interested in the expectation of a certain function of , namely . This term can occur, for example, in the log probability of a Gaussian distribution about . To calculate the expectation, we define a new variable
and then employ the cyclic-permutation property of the matrix-trace operator:
Hence, the expected value of the quadratic function of is the quadratic function evaluated at the expected value of —plus a “correction” term arising from the covariance of .
Simulating Poisson random variates with mean less than 1.
motivation…