B.2 Probability and Statistics

The exponential family and Generalized Linear Models (GLiMs)

Change of variables in probability densities

The score function

The score is defined as the gradient of the log-likelihood (with respect to the parameters, 𝜽\bm{\theta}), dd𝜽logp^(𝒚^;𝜽)\frac{\mathop{}\!\mathrm{d}{}}{\mathop{}\!\mathrm{d}{\bm{\theta}}}\log{\hat{p}% \mathopen{}\mathclose{{}\left(\leavevmode\color[rgb]{.5,.5,.5}\definecolor[% named]{pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5}% \pgfsys@color@gray@fill{.5}\bm{\hat{y}}{};\bm{\theta}}\right)}. The mean of the score is zero:

𝔼𝒀[logp^(𝒀;𝜽)]=𝒚p^(𝒚;𝜽)dd𝜽logp^(𝒚;𝜽)d𝒚=𝒚p^(𝒚;𝜽)1p^(𝒚;𝜽)dd𝜽p^(𝒚;𝜽)d𝒚=𝒚dd𝜽p^(𝒚;𝜽)d𝒚=dd𝜽𝒚p^(𝒚;𝜽)d𝒚=dd𝜽(1)=0.\begin{split}\mathbb{E}_{{\bm{Y}}}{\mathopen{}\mathclose{{}\left[\log{\hat{p}% \mathopen{}\mathclose{{}\left({\bm{Y}}_{;\bm{\theta}}}\right)}}\right]}&=\int_% {\bm{y}{}}{\hat{p}\mathopen{}\mathclose{{}\left(\bm{y};\bm{\theta}}\right)}{% \frac{\mathop{}\!\mathrm{d}{}}{\mathop{}\!\mathrm{d}{\bm{\theta}}}}\log{\hat{p% }\mathopen{}\mathclose{{}\left(\bm{y};\bm{\theta}}\right)}\mathop{}\!\mathrm{d% }{\bm{y}{}}\\ &=\int_{\bm{y}{}}{\hat{p}\mathopen{}\mathclose{{}\left(\bm{y};\bm{\theta}}% \right)}\frac{1}{{\hat{p}\mathopen{}\mathclose{{}\left(\bm{y};\bm{\theta}}% \right)}}{\frac{\mathop{}\!\mathrm{d}{}}{\mathop{}\!\mathrm{d}{\bm{\theta}}}}{% \hat{p}\mathopen{}\mathclose{{}\left(\bm{y};\bm{\theta}}\right)}\mathop{}\!% \mathrm{d}{\bm{y}{}}\\ &=\int_{\bm{y}{}}{\frac{\mathop{}\!\mathrm{d}{}}{\mathop{}\!\mathrm{d}{\bm{% \theta}}}}{\hat{p}\mathopen{}\mathclose{{}\left(\bm{y};\bm{\theta}}\right)}% \mathop{}\!\mathrm{d}{\bm{y}{}}\\ &={\frac{\mathop{}\!\mathrm{d}{}}{\mathop{}\!\mathrm{d}{\bm{\theta}}}}\int_{% \bm{y}{}}{\hat{p}\mathopen{}\mathclose{{}\left(\bm{y};\bm{\theta}}\right)}% \mathop{}\!\mathrm{d}{\bm{y}{}}\\ &={\frac{\mathop{}\!\mathrm{d}{}}{\mathop{}\!\mathrm{d}{\bm{\theta}}}}(1)\\ &=0.\end{split}

The variance of the score is known as the Fisher information. Because its mean is zero, it is also the expected square of the score.

The Fisher information for exponential-family random variables

This turns out to take a simple form. For a (vector) random variable 𝒀{\bm{Y}} and “parameters” 𝜽\bm{\theta} (that may themselves be random variables):

p(𝒚|𝜽)=p(𝒚|𝜼)=h(𝒚)exp{𝜼(𝜽)T𝒕(𝒚)-A(𝜼(𝜽))},p(\leavevmode\color[rgb]{.5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{% .5,.5,.5}\pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\bm{y}|\bm{% \theta})=p(\leavevmode\color[rgb]{.5,.5,.5}\definecolor[named]{pgfstrokecolor}% {rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\bm{y}|% \bm{\eta})=h(\leavevmode\color[rgb]{.5,.5,.5}\definecolor[named]{% pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5}% \pgfsys@color@gray@fill{.5}\bm{y})\exp\bigg{\{}\bm{\eta}(\bm{\theta})^{\text{T% }}\bm{t}(\leavevmode\color[rgb]{.5,.5,.5}\definecolor[named]{pgfstrokecolor}{% rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\bm{y})-% A(\bm{\eta}(\bm{\theta}))\bigg{\}},

the Fisher information is:

I(𝜽)=-𝔼𝒀|𝜽[2𝜽𝜽Tlogp(𝒀|𝜽)|𝜽]=-𝔼𝒀|𝜽[2𝜽𝜽T[𝜼(𝜽)T𝒕(𝒀)-A(𝜼(𝜽))]|𝜽]=-𝔼𝒀|𝜽[i2ηi𝜽𝜽Tti(𝒀)-𝜼T𝜽2A𝜼𝜼T𝜼𝜽T-i2ηi𝜽𝜽TA𝜼i|𝜽]=𝜼T𝜽Cov𝒀|𝜽[𝒕(𝒀)|𝜽]𝜼𝜽T,\begin{split}I(\bm{\theta})&=-\mathbb{E}_{{\bm{Y}}{}|\bm{\theta}}{\mathopen{}% \mathclose{{}\left[\frac{\partial^{2}{}}{\partial{\bm{\theta}}\partial{\bm{% \theta}}^{\text{T}}}\log{p\mathopen{}\mathclose{{}\left({\bm{Y}}\middle|\bm{% \theta}}\right)}\middle|\bm{\theta}{}}\right]}\\ &=-\mathbb{E}_{{\bm{Y}}{}|\bm{\theta}}{\mathopen{}\mathclose{{}\left[\frac{% \partial^{2}{}}{\partial{\bm{\theta}}\partial{\bm{\theta}}^{\text{T}}}% \mathopen{}\mathclose{{}\left[\bm{\eta}(\bm{\theta})^{\text{T}}\bm{t}({\bm{Y}}% )-A(\bm{\eta}(\bm{\theta}))}\right]\middle|\bm{\theta}{}}\right]}\\ &=-\mathbb{E}_{{\bm{Y}}{}|\bm{\theta}}{\mathopen{}\mathclose{{}\left[\sum_{i}% \frac{\partial^{2}{\eta{i}}}{\partial{\bm{\theta}}\partial{\bm{\theta}}^{\text% {T}}}t_{i}({\bm{Y}})-\frac{\partial{\bm{\eta}}^{\text{T}}}{\partial{\bm{\theta% }}}\frac{\partial^{2}{A}}{\partial{\bm{\eta}}\partial{\bm{\eta}}^{\text{T}}}% \frac{\partial{\bm{\eta}}}{\partial{\bm{\theta}}^{\text{T}}}-\sum_{i}\frac{% \partial^{2}{\eta{i}}}{\partial{\bm{\theta}}\partial{\bm{\theta}}^{\text{T}}}% \frac{\partial{A}}{\partial{\bm{\eta}{i}}}\middle|\bm{\theta}{}}\right]}\\ &=\frac{\partial{\bm{\eta}}^{\text{T}}}{\partial{\bm{\theta}}}\text{Cov}_{{\bm% {Y}}{}|\bm{\theta}}{\mathopen{}\mathclose{{}\left[\bm{t}({\bm{Y}})\middle|\bm{% \theta}{}}\right]}\frac{\partial{\bm{\eta}}}{\partial{\bm{\theta}}^{\text{T}}}% ,\end{split}

where in the last line we have used the fact that the derivatives of the log-normalizer are the cumulants of the sufficient statistics (𝑻{\bm{T}}{}) under the distribution. A perhaps more interesting equivalent can be derived by noting that:

𝜽𝔼𝒀|𝜽[𝒕(𝒀)|𝜽]=𝜽A𝜼T=2A𝜼𝜼T𝜼𝜽T=Cov𝒀|𝜽[𝒕(𝒀)|𝜽]𝜼𝜽T.\frac{\partial{}}{\partial{\bm{\theta}}}\mathbb{E}_{{\bm{Y}}{}|\bm{\theta}}{% \mathopen{}\mathclose{{}\left[\bm{t}({\bm{Y}})\middle|\bm{\theta}{}}\right]}=% \frac{\partial{}}{\partial{\bm{\theta}}}\frac{\partial{A}}{\partial{\bm{\eta}}% ^{\text{T}}}=\frac{\partial^{2}{A}}{\partial{\bm{\eta}}\partial{\bm{\eta}}^{% \text{T}}}\frac{\partial{\bm{\eta}}}{\partial{\bm{\theta}}^{\text{T}}}=\text{% Cov}_{{\bm{Y}}{}|\bm{\theta}}{\mathopen{}\mathclose{{}\left[\bm{t}({\bm{Y}})% \middle|\bm{\theta}{}}\right]}\frac{\partial{\bm{\eta}}}{\partial{\bm{\theta}}% ^{\text{T}}}.

Therefore,

(𝜽𝔼𝒀|𝜽[𝒕(𝒀)|𝜽])TCov𝒀|[𝒕(𝒀)|𝜽]-1(𝜽𝔼𝒀|𝜽[𝒕(𝒀)|𝜽])=𝜼T𝜽Cov𝒀|𝜽[𝒕(𝒀)|𝜽]𝜼𝜽T=I(𝜽).\mathopen{}\mathclose{{}\left(\frac{\partial{}}{\partial{\bm{\theta}}}\mathbb{% E}_{{\bm{Y}}{}|\bm{\theta}}{\mathopen{}\mathclose{{}\left[\bm{t}({\bm{Y}})% \middle|\bm{\theta}{}}\right]}}\right)^{\text{T}}{\text{Cov}_{{\bm{Y}}{}|}{% \mathopen{}\mathclose{{}\left[\bm{t}({\bm{Y}})\middle|\bm{\theta}{}}\right]}}^% {-1}\mathopen{}\mathclose{{}\left(\frac{\partial{}}{\partial{\bm{\theta}}}% \mathbb{E}_{{\bm{Y}}{}|\bm{\theta}}{\mathopen{}\mathclose{{}\left[\bm{t}({\bm{% Y}})\middle|\bm{\theta}{}}\right]}}\right)=\frac{\partial{\bm{\eta}}^{\text{T}% }}{\partial{\bm{\theta}}}\text{Cov}_{{\bm{Y}}{}|\bm{\theta}}{\mathopen{}% \mathclose{{}\left[\bm{t}({\bm{Y}})\middle|\bm{\theta}{}}\right]}\frac{% \partial{\bm{\eta}}}{\partial{\bm{\theta}}^{\text{T}}}=I(\bm{\theta}). (B.12)

Markov chains

Discrete random variables

[[[table]]]

Useful identities

Expectations of quadratic forms.

Consider a vector random variable 𝑿{\bm{X}} with mean 𝝁\bm{\mu} and covariance 𝚺\mathbf{{\Sigma}}. We are interested in the expectation of a certain function of 𝑿{\bm{X}}, namely (𝒃-𝐂𝑿)T𝐀(𝒃-𝐂𝑿)\mathopen{}\mathclose{{}\left(\bm{b}-\mathbf{{C}}{\bm{X}}}\right)^{\text{T}}% \mathbf{A}\mathopen{}\mathclose{{}\left(\bm{b}-\mathbf{{C}}{\bm{X}}}\right). This term can occur, for example, in the log probability of a Gaussian distribution about 𝐂𝑿\mathbf{{C}}{\bm{X}}. To calculate the expectation, we define a new variable

𝒁 . . =𝐀1/2(𝒃-𝐂𝑿){\bm{Z}}\mathrel{\vbox{\hbox{\scriptsize.}\hbox{\scriptsize.} }}=\mathbf{A}^{1/2}\mathopen{}\mathclose{{}\left(\bm{b}-\mathbf{{C}}{\bm{X}}}\right)

and then employ the cyclic-permutation property of the matrix-trace operator:

𝔼𝑿[(𝒃-𝐂𝑿)T𝐀(𝒃-𝐂𝑿)]=𝔼𝒁[𝒁T𝒁]=𝔼𝒁[tr[𝒁T𝒁]]=𝔼𝒁[tr[𝒁𝒁T]]=tr[𝔼𝒁[𝒁𝒁T]]=tr[Cov𝒁[𝒁]+𝔼𝒁[𝒁]𝔼𝒁[𝒁T]]=tr[𝐀1/2𝐂𝚺𝐂T𝐀T/2+𝐀1/2(𝒃-𝐂𝝁)(𝒃-𝐂𝝁)T𝐀T/2]=tr[𝐀𝐂𝚺𝐂T]+tr[(𝒃-𝐂𝝁)T𝐀(𝒃-𝐂𝝁)]=tr[𝐀𝐂𝚺𝐂T]+(𝒃-𝐂𝝁)T𝐀(𝒃-𝐂𝝁)\begin{split}\mathbb{E}_{{\bm{X}}}{\mathopen{}\mathclose{{}\left[\mathopen{}% \mathclose{{}\left(\bm{b}-\mathbf{{C}}{\bm{X}}}\right)^{\text{T}}\mathbf{A}% \mathopen{}\mathclose{{}\left(\bm{b}-\mathbf{{C}}{\bm{X}}}\right)}\right]}&=% \mathbb{E}_{{\bm{Z}}}{\mathopen{}\mathclose{{}\left[{\bm{Z}}^{\text{T}}{\bm{Z}% }}\right]}\\ &=\mathbb{E}_{{\bm{Z}}}{\mathopen{}\mathclose{{}\left[\text{tr}\mathopen{}% \mathclose{{}\left[{\bm{Z}}^{\text{T}}{\bm{Z}}}\right]}\right]}\\ &=\mathbb{E}_{{\bm{Z}}}{\mathopen{}\mathclose{{}\left[\text{tr}\mathopen{}% \mathclose{{}\left[{\bm{Z}}{\bm{Z}}^{\text{T}}}\right]}\right]}\\ &=\text{tr}\mathopen{}\mathclose{{}\left[\mathbb{E}_{{\bm{Z}}}{\mathopen{}% \mathclose{{}\left[{\bm{Z}}{\bm{Z}}^{\text{T}}}\right]}}\right]\\ &=\text{tr}\mathopen{}\mathclose{{}\left[\text{Cov}_{{\bm{Z}}}{\mathopen{}% \mathclose{{}\left[{\bm{Z}}}\right]}+\mathbb{E}_{{\bm{Z}}}{\mathopen{}% \mathclose{{}\left[{\bm{Z}}}\right]}\mathbb{E}_{{\bm{Z}}}{\mathopen{}% \mathclose{{}\left[{\bm{Z}}^{\text{T}}}\right]}}\right]\\ &=\text{tr}\mathopen{}\mathclose{{}\left[\mathbf{A}^{1/2}\mathbf{{C}}\mathbf{{% \Sigma}}\mathbf{{C}}^{\text{T}}\mathbf{A}^{\text{T}/2}+\mathbf{A}^{1/2}(\bm{b}% -\mathbf{{C}}\bm{\mu})(\bm{b}-\mathbf{{C}}\bm{\mu})^{\text{T}}\mathbf{A}^{% \text{T}/2}}\right]\\ &=\text{tr}\mathopen{}\mathclose{{}\left[\mathbf{A}\mathbf{{C}}\mathbf{{\Sigma% }}\mathbf{{C}}^{\text{T}}}\right]+\text{tr}\mathopen{}\mathclose{{}\left[(\bm{% b}-\mathbf{{C}}\bm{\mu})^{\text{T}}\mathbf{A}(\bm{b}-\mathbf{{C}}\bm{\mu})}% \right]\\ &=\text{tr}\mathopen{}\mathclose{{}\left[\mathbf{A}\mathbf{{C}}\mathbf{{\Sigma% }}\mathbf{{C}}^{\text{T}}}\right]+(\bm{b}-\mathbf{{C}}\bm{\mu})^{\text{T}}% \mathbf{A}(\bm{b}-\mathbf{{C}}\bm{\mu})\\ \end{split} (B.13)

Hence, the expected value of the quadratic function of 𝑿{\bm{X}} is the quadratic function evaluated at the expected value of 𝑿{\bm{X}}—plus a “correction” term arising from the covariance of 𝑿{\bm{X}}.

Simulating Poisson random variates with mean less than 1.

motivation…

Consider the graphical model shown below. We want to show that the marginal probabilitity of Y^{\hat{Y}} is distributed as a Poisson random variable with mean μ\mu—as long as μ<1\mu<1. The derivation at right shows this marginalization. The third line follows because the probability of Y^{\hat{Y}} (the number of “successes”) is zero for any Y^>X^{\hat{Y}}>{\hat{X}}, since X^{\hat{X}} is the number of Bernoulli trials (it is impossible to have more successes than trials). X^{\hat{X}}p^(𝒙^;𝜽)=Pois(1){\hat{p}\mathopen{}\mathclose{{}\left(\leavevmode\color[rgb]{.5,.5,.5}% \definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5% }\pgfsys@color@gray@fill{.5}\bm{\hat{x}}{};\bm{\theta}}\right)}={\text{Pois}% \mathopen{}\mathclose{{}\left(1}\right)}Y^{\hat{Y}}p^(𝒚^|𝒙^;𝜽)=Bino(x^,μ){\hat{p}\mathopen{}\mathclose{{}\left(\leavevmode\color[rgb]{.5,.5,.5}% \definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5% }\pgfsys@color@gray@fill{.5}\bm{\hat{y}}{}\middle|\leavevmode\color[rgb]{% .5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}% \pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\bm{\hat{x}}{};\bm{% \theta}}\right)}={\text{Bino}\mathopen{}\mathclose{{}\left(\leavevmode\color[% rgb]{.5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}% \pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\hat{x},\mu}\right)} NN p^(y^;𝜽)=x^=0p^(x^;𝜽)p^(y^|x^;𝜽)=x^=0Pois(x^;1)Bino(y^;x^,μ)=x^=y^Pois(x^;1)Bino(y^;x^,μ)=x^=y^e-1x^!(x^y^)μy^(1-μ)x^-y^=x^=y^e-1x^!x^!y^!(x^-y^)!μy^(1-μ)x^-y^=e-1μy^y^!x^=y^1(x^-y^)!(1-μ)x^-y^=e-1μy^y^!m=01m!(1-μ)m=e-1μy^y^!e1-μ=e-μμy^y^!=Pois(μ)\begin{split}{\hat{p}\mathopen{}\mathclose{{}\left(\leavevmode\color[rgb]{% .5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}% \pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\hat{y};\bm{\theta}}% \right)}&=\sum_{\hat{x}{}=0}^{\infty}{\hat{p}\mathopen{}\mathclose{{}\left(% \hat{x};\bm{\theta}}\right)}{\hat{p}\mathopen{}\mathclose{{}\left(\leavevmode% \color[rgb]{.5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}% \pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\hat{y}\middle|\hat{x}% ;\bm{\theta}}\right)}\\ &=\sum_{\hat{x}{}=0}^{\infty}\text{Pois}\mathopen{}\mathclose{{}\left(\hat{x};% 1}\right)\text{Bino}\mathopen{}\mathclose{{}\left(\leavevmode\color[rgb]{% .5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}% \pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\hat{y};\hat{x},\mu}% \right)\\ &=\sum_{\hat{x}{}=\leavevmode\color[rgb]{.5,.5,.5}\definecolor[named]{% pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5}% \pgfsys@color@gray@fill{.5}\hat{y}}^{\infty}\text{Pois}\mathopen{}\mathclose{{% }\left(\hat{x};1}\right)\text{Bino}\mathopen{}\mathclose{{}\left(\leavevmode% \color[rgb]{.5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}% \pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\hat{y};\hat{x},\mu}% \right)\\ &=\sum_{\hat{x}{}=\leavevmode\color[rgb]{.5,.5,.5}\definecolor[named]{% pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5}% \pgfsys@color@gray@fill{.5}\hat{y}}^{\infty}\frac{e^{-1}}{\hat{x}!}{\hat{x}% \choose\leavevmode\color[rgb]{.5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb% }{.5,.5,.5}\pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\hat{y}}\mu% ^{\leavevmode\color[rgb]{.5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{% .5,.5,.5}\pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\hat{y}}(1-% \mu)^{\hat{x}-\leavevmode\color[rgb]{.5,.5,.5}\definecolor[named]{% pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5}% \pgfsys@color@gray@fill{.5}\hat{y}}\\ &=\sum_{\hat{x}{}=\leavevmode\color[rgb]{.5,.5,.5}\definecolor[named]{% pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5}% \pgfsys@color@gray@fill{.5}\hat{y}}^{\infty}\frac{e^{-1}}{\hat{x}!}\frac{\hat{% x}!}{\leavevmode\color[rgb]{.5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{% .5,.5,.5}\pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\hat{y}!(\hat% {x}-\leavevmode\color[rgb]{.5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{% .5,.5,.5}\pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\hat{y})!}\mu% ^{\leavevmode\color[rgb]{.5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{% .5,.5,.5}\pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\hat{y}}(1-% \mu)^{\hat{x}-\leavevmode\color[rgb]{.5,.5,.5}\definecolor[named]{% pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5}% \pgfsys@color@gray@fill{.5}\hat{y}}\\ &=\frac{e^{-1}\mu^{\leavevmode\color[rgb]{.5,.5,.5}\definecolor[named]{% pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5}% \pgfsys@color@gray@fill{.5}\hat{y}}}{\leavevmode\color[rgb]{.5,.5,.5}% \definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5% }\pgfsys@color@gray@fill{.5}\hat{y}!}\sum_{\hat{x}{}=\leavevmode\color[rgb]{% .5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}% \pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\hat{y}}^{\infty}\frac% {1}{(\hat{x}-\leavevmode\color[rgb]{.5,.5,.5}\definecolor[named]{% pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5}% \pgfsys@color@gray@fill{.5}\hat{y})!}(1-\mu)^{\hat{x}-\leavevmode\color[rgb]{% .5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}% \pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\hat{y}}\\ &=\frac{e^{-1}\mu^{\leavevmode\color[rgb]{.5,.5,.5}\definecolor[named]{% pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5}% \pgfsys@color@gray@fill{.5}\hat{y}}}{\leavevmode\color[rgb]{.5,.5,.5}% \definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5% }\pgfsys@color@gray@fill{.5}\hat{y}!}\sum_{m{}=0}^{\infty}\frac{1}{m!}(1-\mu)^% {m}\\ &=\frac{e^{-1}\mu^{\leavevmode\color[rgb]{.5,.5,.5}\definecolor[named]{% pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5}% \pgfsys@color@gray@fill{.5}\hat{y}}}{\leavevmode\color[rgb]{.5,.5,.5}% \definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5% }\pgfsys@color@gray@fill{.5}\hat{y}!}e^{1-\mu}=\frac{e^{-\mu}\mu^{\leavevmode% \color[rgb]{.5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}% \pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\hat{y}}}{\leavevmode% \color[rgb]{.5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}% \pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\hat{y}!}=\text{Pois}% \mathopen{}\mathclose{{}\left(\mu}\right)\end{split}