B.2 Probability and Statistics

The exponential family and Generalized Linear Models (GLiMs)

Change of variables in probability densities

The score function

The score is defined as the gradient of the log-likelihood (with respect to the parameters, 𝜽\bm{\theta}), dd𝜽logp^(𝒚^;𝜽)\frac{\mathop{}\!\mathrm{d}{}}{\mathop{}\!\mathrm{d}{\bm{\theta}}}\log{\hat{p}%\mathopen{}\mathclose{{}\left(\leavevmode\color[rgb]{.5,.5,.5}\definecolor[%named]{pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5}%\pgfsys@color@gray@fill{.5}\bm{\hat{y}}{};\bm{\theta}}\right)}. The mean of the score is zero:

𝔼𝒀[logp^(𝒀;𝜽)]=𝒚p^(𝒚;𝜽)dd𝜽logp^(𝒚;𝜽)d𝒚=𝒚p^(𝒚;𝜽)1p^(𝒚;𝜽)dd𝜽p^(𝒚;𝜽)d𝒚=𝒚dd𝜽p^(𝒚;𝜽)d𝒚=dd𝜽𝒚p^(𝒚;𝜽)d𝒚=dd𝜽(1)=0.\begin{split}\mathbb{E}_{{\bm{Y}}}{\mathopen{}\mathclose{{}\left[\log{\hat{p}%\mathopen{}\mathclose{{}\left({\bm{Y}}_{;\bm{\theta}}}\right)}}\right]}&=\int_%{\bm{y}{}}{\hat{p}\mathopen{}\mathclose{{}\left(\bm{y};\bm{\theta}}\right)}{%\frac{\mathop{}\!\mathrm{d}{}}{\mathop{}\!\mathrm{d}{\bm{\theta}}}}\log{\hat{p%}\mathopen{}\mathclose{{}\left(\bm{y};\bm{\theta}}\right)}\mathop{}\!\mathrm{d%}{\bm{y}{}}\\&=\int_{\bm{y}{}}{\hat{p}\mathopen{}\mathclose{{}\left(\bm{y};\bm{\theta}}%\right)}\frac{1}{{\hat{p}\mathopen{}\mathclose{{}\left(\bm{y};\bm{\theta}}%\right)}}{\frac{\mathop{}\!\mathrm{d}{}}{\mathop{}\!\mathrm{d}{\bm{\theta}}}}{%\hat{p}\mathopen{}\mathclose{{}\left(\bm{y};\bm{\theta}}\right)}\mathop{}\!%\mathrm{d}{\bm{y}{}}\\&=\int_{\bm{y}{}}{\frac{\mathop{}\!\mathrm{d}{}}{\mathop{}\!\mathrm{d}{\bm{%\theta}}}}{\hat{p}\mathopen{}\mathclose{{}\left(\bm{y};\bm{\theta}}\right)}%\mathop{}\!\mathrm{d}{\bm{y}{}}\\&={\frac{\mathop{}\!\mathrm{d}{}}{\mathop{}\!\mathrm{d}{\bm{\theta}}}}\int_{%\bm{y}{}}{\hat{p}\mathopen{}\mathclose{{}\left(\bm{y};\bm{\theta}}\right)}%\mathop{}\!\mathrm{d}{\bm{y}{}}\\&={\frac{\mathop{}\!\mathrm{d}{}}{\mathop{}\!\mathrm{d}{\bm{\theta}}}}(1)\\&=0.\end{split}

The variance of the score is known as the Fisher information. Because its mean is zero, it is also the expected square of the score.

The Fisher information for exponential-family random variables

This turns out to take a simple form. For a (vector) random variable 𝒀{\bm{Y}} and “parameters” 𝜽\bm{\theta} (that may themselves be random variables):

p(𝒚|𝜽)=p(𝒚|𝜼)=h(𝒚)exp{𝜼(𝜽)T𝒕(𝒚)-A(𝜼(𝜽))},p(\leavevmode\color[rgb]{.5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{%.5,.5,.5}\pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\bm{y}|\bm{%\theta})=p(\leavevmode\color[rgb]{.5,.5,.5}\definecolor[named]{pgfstrokecolor}%{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\bm{y}|%\bm{\eta})=h(\leavevmode\color[rgb]{.5,.5,.5}\definecolor[named]{%pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5}%\pgfsys@color@gray@fill{.5}\bm{y})\exp\bigg{\{}\bm{\eta}(\bm{\theta})^{\text{T%}}\bm{t}(\leavevmode\color[rgb]{.5,.5,.5}\definecolor[named]{pgfstrokecolor}{%rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\bm{y})-%A(\bm{\eta}(\bm{\theta}))\bigg{\}},

the Fisher information is:

I(𝜽)=-𝔼𝒀|𝜽[2𝜽𝜽Tlogp(𝒀|𝜽)|𝜽]=-𝔼𝒀|𝜽[2𝜽𝜽T[𝜼(𝜽)T𝒕(𝒀)-A(𝜼(𝜽))]|𝜽]=-𝔼𝒀|𝜽[i2ηi𝜽𝜽Tti(𝒀)-𝜼T𝜽2A𝜼𝜼T𝜼𝜽T-i2ηi𝜽𝜽TA𝜼i|𝜽]=𝜼T𝜽Cov𝒀|𝜽[𝒕(𝒀)|𝜽]𝜼𝜽T,\begin{split}I(\bm{\theta})&=-\mathbb{E}_{{\bm{Y}}{}|\bm{\theta}}{\mathopen{}%\mathclose{{}\left[\frac{\partial^{2}{}}{\partial{\bm{\theta}}\partial{\bm{%\theta}}^{\text{T}}}\log{p\mathopen{}\mathclose{{}\left({\bm{Y}}\middle|\bm{%\theta}}\right)}\middle|\bm{\theta}{}}\right]}\\&=-\mathbb{E}_{{\bm{Y}}{}|\bm{\theta}}{\mathopen{}\mathclose{{}\left[\frac{%\partial^{2}{}}{\partial{\bm{\theta}}\partial{\bm{\theta}}^{\text{T}}}%\mathopen{}\mathclose{{}\left[\bm{\eta}(\bm{\theta})^{\text{T}}\bm{t}({\bm{Y}}%)-A(\bm{\eta}(\bm{\theta}))}\right]\middle|\bm{\theta}{}}\right]}\\&=-\mathbb{E}_{{\bm{Y}}{}|\bm{\theta}}{\mathopen{}\mathclose{{}\left[\sum_{i}%\frac{\partial^{2}{\eta{i}}}{\partial{\bm{\theta}}\partial{\bm{\theta}}^{\text%{T}}}t_{i}({\bm{Y}})-\frac{\partial{\bm{\eta}}^{\text{T}}}{\partial{\bm{\theta%}}}\frac{\partial^{2}{A}}{\partial{\bm{\eta}}\partial{\bm{\eta}}^{\text{T}}}%\frac{\partial{\bm{\eta}}}{\partial{\bm{\theta}}^{\text{T}}}-\sum_{i}\frac{%\partial^{2}{\eta{i}}}{\partial{\bm{\theta}}\partial{\bm{\theta}}^{\text{T}}}%\frac{\partial{A}}{\partial{\bm{\eta}{i}}}\middle|\bm{\theta}{}}\right]}\\&=\frac{\partial{\bm{\eta}}^{\text{T}}}{\partial{\bm{\theta}}}\text{Cov}_{{\bm%{Y}}{}|\bm{\theta}}{\mathopen{}\mathclose{{}\left[\bm{t}({\bm{Y}})\middle|\bm{%\theta}{}}\right]}\frac{\partial{\bm{\eta}}}{\partial{\bm{\theta}}^{\text{T}}}%,\end{split}

where in the last line we have used the fact that the derivatives of the log-normalizer are the cumulants of the sufficient statistics (𝑻{\bm{T}}{}) under the distribution. A perhaps more interesting equivalent can be derived by noting that:

𝜽𝔼𝒀|𝜽[𝒕(𝒀)|𝜽]=𝜽A𝜼T=2A𝜼𝜼T𝜼𝜽T=Cov𝒀|𝜽[𝒕(𝒀)|𝜽]𝜼𝜽T.\frac{\partial{}}{\partial{\bm{\theta}}}\mathbb{E}_{{\bm{Y}}{}|\bm{\theta}}{%\mathopen{}\mathclose{{}\left[\bm{t}({\bm{Y}})\middle|\bm{\theta}{}}\right]}=%\frac{\partial{}}{\partial{\bm{\theta}}}\frac{\partial{A}}{\partial{\bm{\eta}}%^{\text{T}}}=\frac{\partial^{2}{A}}{\partial{\bm{\eta}}\partial{\bm{\eta}}^{%\text{T}}}\frac{\partial{\bm{\eta}}}{\partial{\bm{\theta}}^{\text{T}}}=\text{%Cov}_{{\bm{Y}}{}|\bm{\theta}}{\mathopen{}\mathclose{{}\left[\bm{t}({\bm{Y}})%\middle|\bm{\theta}{}}\right]}\frac{\partial{\bm{\eta}}}{\partial{\bm{\theta}}%^{\text{T}}}.

Therefore,

(𝜽𝔼𝒀|𝜽[𝒕(𝒀)|𝜽])TCov𝒀|[𝒕(𝒀)|𝜽]-1(𝜽𝔼𝒀|𝜽[𝒕(𝒀)|𝜽])=𝜼T𝜽Cov𝒀|𝜽[𝒕(𝒀)|𝜽]𝜼𝜽T=I(𝜽).\mathopen{}\mathclose{{}\left(\frac{\partial{}}{\partial{\bm{\theta}}}\mathbb{%E}_{{\bm{Y}}{}|\bm{\theta}}{\mathopen{}\mathclose{{}\left[\bm{t}({\bm{Y}})%\middle|\bm{\theta}{}}\right]}}\right)^{\text{T}}{\text{Cov}_{{\bm{Y}}{}|}{%\mathopen{}\mathclose{{}\left[\bm{t}({\bm{Y}})\middle|\bm{\theta}{}}\right]}}^%{-1}\mathopen{}\mathclose{{}\left(\frac{\partial{}}{\partial{\bm{\theta}}}%\mathbb{E}_{{\bm{Y}}{}|\bm{\theta}}{\mathopen{}\mathclose{{}\left[\bm{t}({\bm{%Y}})\middle|\bm{\theta}{}}\right]}}\right)=\frac{\partial{\bm{\eta}}^{\text{T}%}}{\partial{\bm{\theta}}}\text{Cov}_{{\bm{Y}}{}|\bm{\theta}}{\mathopen{}%\mathclose{{}\left[\bm{t}({\bm{Y}})\middle|\bm{\theta}{}}\right]}\frac{%\partial{\bm{\eta}}}{\partial{\bm{\theta}}^{\text{T}}}=I(\bm{\theta}). (B.12)

Markov chains

Discrete random variables

[[[table]]]

Useful identities

Expectations of quadratic forms.

Consider a vector random variable 𝑿{\bm{X}} with mean 𝝁\bm{\mu} and covariance 𝚺\mathbf{{\Sigma}}. We are interested in the expectation of a certain function of 𝑿{\bm{X}}, namely (𝒃-𝐂𝑿)T𝐀(𝒃-𝐂𝑿)\mathopen{}\mathclose{{}\left(\bm{b}-\mathbf{{C}}{\bm{X}}}\right)^{\text{T}}%\mathbf{A}\mathopen{}\mathclose{{}\left(\bm{b}-\mathbf{{C}}{\bm{X}}}\right). This term can occur, for example, in the log probability of a Gaussian distribution about 𝐂𝑿\mathbf{{C}}{\bm{X}}. To calculate the expectation, we define a new variable

𝒁..=𝐀1/2(𝒃-𝐂𝑿){\bm{Z}}\mathrel{\vbox{\hbox{\scriptsize.}\hbox{\scriptsize.}}}=\mathbf{A}^{1/2}\mathopen{}\mathclose{{}\left(\bm{b}-\mathbf{{C}}{\bm{X}}}\right)

and then employ the cyclic-permutation property of the matrix-trace operator:

𝔼𝑿[(𝒃-𝐂𝑿)T𝐀(𝒃-𝐂𝑿)]=𝔼𝒁[𝒁T𝒁]=𝔼𝒁[tr[𝒁T𝒁]]=𝔼𝒁[tr[𝒁𝒁T]]=tr[𝔼𝒁[𝒁𝒁T]]=tr[Cov𝒁[𝒁]+𝔼𝒁[𝒁]𝔼𝒁[𝒁T]]=tr[𝐀1/2𝐂𝚺𝐂T𝐀T/2+𝐀1/2(𝒃-𝐂𝝁)(𝒃-𝐂𝝁)T𝐀T/2]=tr[𝐀𝐂𝚺𝐂T]+tr[(𝒃-𝐂𝝁)T𝐀(𝒃-𝐂𝝁)]=tr[𝐀𝐂𝚺𝐂T]+(𝒃-𝐂𝝁)T𝐀(𝒃-𝐂𝝁)\begin{split}\mathbb{E}_{{\bm{X}}}{\mathopen{}\mathclose{{}\left[\mathopen{}%\mathclose{{}\left(\bm{b}-\mathbf{{C}}{\bm{X}}}\right)^{\text{T}}\mathbf{A}%\mathopen{}\mathclose{{}\left(\bm{b}-\mathbf{{C}}{\bm{X}}}\right)}\right]}&=%\mathbb{E}_{{\bm{Z}}}{\mathopen{}\mathclose{{}\left[{\bm{Z}}^{\text{T}}{\bm{Z}%}}\right]}\\&=\mathbb{E}_{{\bm{Z}}}{\mathopen{}\mathclose{{}\left[\text{tr}\mathopen{}%\mathclose{{}\left[{\bm{Z}}^{\text{T}}{\bm{Z}}}\right]}\right]}\\&=\mathbb{E}_{{\bm{Z}}}{\mathopen{}\mathclose{{}\left[\text{tr}\mathopen{}%\mathclose{{}\left[{\bm{Z}}{\bm{Z}}^{\text{T}}}\right]}\right]}\\&=\text{tr}\mathopen{}\mathclose{{}\left[\mathbb{E}_{{\bm{Z}}}{\mathopen{}%\mathclose{{}\left[{\bm{Z}}{\bm{Z}}^{\text{T}}}\right]}}\right]\\&=\text{tr}\mathopen{}\mathclose{{}\left[\text{Cov}_{{\bm{Z}}}{\mathopen{}%\mathclose{{}\left[{\bm{Z}}}\right]}+\mathbb{E}_{{\bm{Z}}}{\mathopen{}%\mathclose{{}\left[{\bm{Z}}}\right]}\mathbb{E}_{{\bm{Z}}}{\mathopen{}%\mathclose{{}\left[{\bm{Z}}^{\text{T}}}\right]}}\right]\\&=\text{tr}\mathopen{}\mathclose{{}\left[\mathbf{A}^{1/2}\mathbf{{C}}\mathbf{{%\Sigma}}\mathbf{{C}}^{\text{T}}\mathbf{A}^{\text{T}/2}+\mathbf{A}^{1/2}(\bm{b}%-\mathbf{{C}}\bm{\mu})(\bm{b}-\mathbf{{C}}\bm{\mu})^{\text{T}}\mathbf{A}^{%\text{T}/2}}\right]\\&=\text{tr}\mathopen{}\mathclose{{}\left[\mathbf{A}\mathbf{{C}}\mathbf{{\Sigma%}}\mathbf{{C}}^{\text{T}}}\right]+\text{tr}\mathopen{}\mathclose{{}\left[(\bm{%b}-\mathbf{{C}}\bm{\mu})^{\text{T}}\mathbf{A}(\bm{b}-\mathbf{{C}}\bm{\mu})}%\right]\\&=\text{tr}\mathopen{}\mathclose{{}\left[\mathbf{A}\mathbf{{C}}\mathbf{{\Sigma%}}\mathbf{{C}}^{\text{T}}}\right]+(\bm{b}-\mathbf{{C}}\bm{\mu})^{\text{T}}%\mathbf{A}(\bm{b}-\mathbf{{C}}\bm{\mu})\\\end{split} (B.13)

Hence, the expected value of the quadratic function of 𝑿{\bm{X}} is the quadratic function evaluated at the expected value of 𝑿{\bm{X}}—plus a “correction” term arising from the covariance of 𝑿{\bm{X}}.

Simulating Poisson random variates with mean less than 1.

motivation…

Consider the graphical model shown below. We want to show that the marginal probabilitity of Y^{\hat{Y}} is distributed as a Poisson random variable with mean μ\mu—as long as μ<1\mu<1. The derivation at right shows this marginalization. The third line follows because the probability of Y^{\hat{Y}} (the number of “successes”) is zero for any Y^>X^{\hat{Y}}>{\hat{X}}, since X^{\hat{X}} is the number of Bernoulli trials (it is impossible to have more successes than trials). X^{\hat{X}}p^(𝒙^;𝜽)=Pois(1){\hat{p}\mathopen{}\mathclose{{}\left(\leavevmode\color[rgb]{.5,.5,.5}%\definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5%}\pgfsys@color@gray@fill{.5}\bm{\hat{x}}{};\bm{\theta}}\right)}={\text{Pois}%\mathopen{}\mathclose{{}\left(1}\right)}Y^{\hat{Y}}p^(𝒚^|𝒙^;𝜽)=Bino(x^,μ){\hat{p}\mathopen{}\mathclose{{}\left(\leavevmode\color[rgb]{.5,.5,.5}%\definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5%}\pgfsys@color@gray@fill{.5}\bm{\hat{y}}{}\middle|\leavevmode\color[rgb]{%.5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}%\pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\bm{\hat{x}}{};\bm{%\theta}}\right)}={\text{Bino}\mathopen{}\mathclose{{}\left(\leavevmode\color[%rgb]{.5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}%\pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\hat{x},\mu}\right)} NN p^(y^;𝜽)=x^=0p^(x^;𝜽)p^(y^|x^;𝜽)=x^=0Pois(x^;1)Bino(y^;x^,μ)=x^=y^Pois(x^;1)Bino(y^;x^,μ)=x^=y^e-1x^!(x^y^)μy^(1-μ)x^-y^=x^=y^e-1x^!x^!y^!(x^-y^)!μy^(1-μ)x^-y^=e-1μy^y^!x^=y^1(x^-y^)!(1-μ)x^-y^=e-1μy^y^!m=01m!(1-μ)m=e-1μy^y^!e1-μ=e-μμy^y^!=Pois(μ)\begin{split}{\hat{p}\mathopen{}\mathclose{{}\left(\leavevmode\color[rgb]{%.5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}%\pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\hat{y};\bm{\theta}}%\right)}&=\sum_{\hat{x}{}=0}^{\infty}{\hat{p}\mathopen{}\mathclose{{}\left(%\hat{x};\bm{\theta}}\right)}{\hat{p}\mathopen{}\mathclose{{}\left(\leavevmode%\color[rgb]{.5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}%\pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\hat{y}\middle|\hat{x}%;\bm{\theta}}\right)}\\&=\sum_{\hat{x}{}=0}^{\infty}\text{Pois}\mathopen{}\mathclose{{}\left(\hat{x};%1}\right)\text{Bino}\mathopen{}\mathclose{{}\left(\leavevmode\color[rgb]{%.5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}%\pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\hat{y};\hat{x},\mu}%\right)\\&=\sum_{\hat{x}{}=\leavevmode\color[rgb]{.5,.5,.5}\definecolor[named]{%pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5}%\pgfsys@color@gray@fill{.5}\hat{y}}^{\infty}\text{Pois}\mathopen{}\mathclose{{%}\left(\hat{x};1}\right)\text{Bino}\mathopen{}\mathclose{{}\left(\leavevmode%\color[rgb]{.5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}%\pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\hat{y};\hat{x},\mu}%\right)\\&=\sum_{\hat{x}{}=\leavevmode\color[rgb]{.5,.5,.5}\definecolor[named]{%pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5}%\pgfsys@color@gray@fill{.5}\hat{y}}^{\infty}\frac{e^{-1}}{\hat{x}!}{\hat{x}%\choose\leavevmode\color[rgb]{.5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb%}{.5,.5,.5}\pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\hat{y}}\mu%^{\leavevmode\color[rgb]{.5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{%.5,.5,.5}\pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\hat{y}}(1-%\mu)^{\hat{x}-\leavevmode\color[rgb]{.5,.5,.5}\definecolor[named]{%pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5}%\pgfsys@color@gray@fill{.5}\hat{y}}\\&=\sum_{\hat{x}{}=\leavevmode\color[rgb]{.5,.5,.5}\definecolor[named]{%pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5}%\pgfsys@color@gray@fill{.5}\hat{y}}^{\infty}\frac{e^{-1}}{\hat{x}!}\frac{\hat{%x}!}{\leavevmode\color[rgb]{.5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{%.5,.5,.5}\pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\hat{y}!(\hat%{x}-\leavevmode\color[rgb]{.5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{%.5,.5,.5}\pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\hat{y})!}\mu%^{\leavevmode\color[rgb]{.5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{%.5,.5,.5}\pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\hat{y}}(1-%\mu)^{\hat{x}-\leavevmode\color[rgb]{.5,.5,.5}\definecolor[named]{%pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5}%\pgfsys@color@gray@fill{.5}\hat{y}}\\&=\frac{e^{-1}\mu^{\leavevmode\color[rgb]{.5,.5,.5}\definecolor[named]{%pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5}%\pgfsys@color@gray@fill{.5}\hat{y}}}{\leavevmode\color[rgb]{.5,.5,.5}%\definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5%}\pgfsys@color@gray@fill{.5}\hat{y}!}\sum_{\hat{x}{}=\leavevmode\color[rgb]{%.5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}%\pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\hat{y}}^{\infty}\frac%{1}{(\hat{x}-\leavevmode\color[rgb]{.5,.5,.5}\definecolor[named]{%pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5}%\pgfsys@color@gray@fill{.5}\hat{y})!}(1-\mu)^{\hat{x}-\leavevmode\color[rgb]{%.5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}%\pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\hat{y}}\\&=\frac{e^{-1}\mu^{\leavevmode\color[rgb]{.5,.5,.5}\definecolor[named]{%pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5}%\pgfsys@color@gray@fill{.5}\hat{y}}}{\leavevmode\color[rgb]{.5,.5,.5}%\definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5%}\pgfsys@color@gray@fill{.5}\hat{y}!}\sum_{m{}=0}^{\infty}\frac{1}{m!}(1-\mu)^%{m}\\&=\frac{e^{-1}\mu^{\leavevmode\color[rgb]{.5,.5,.5}\definecolor[named]{%pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5}%\pgfsys@color@gray@fill{.5}\hat{y}}}{\leavevmode\color[rgb]{.5,.5,.5}%\definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5%}\pgfsys@color@gray@fill{.5}\hat{y}!}e^{1-\mu}=\frac{e^{-\mu}\mu^{\leavevmode%\color[rgb]{.5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}%\pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\hat{y}}}{\leavevmode%\color[rgb]{.5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}%\pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\hat{y}!}=\text{Pois}%\mathopen{}\mathclose{{}\left(\mu}\right)\end{split}