3.1 The exponential-family harmonium
In Chapter 2, we encountered a direct trade-off between the expressivity of the model emission distribution,
Suppose instead, then, we simply declared at the outset our two (so far) desiderata: easily computable emission and posterior distributions. Of course, not every pair of such distributions will be compatible, but perhaps if we start with some very general form for these distributions, we can subsequently determine what restrictions will be required for their consistency. In so doing, we shall have derived a rather general undirected graphical model known as the exponential-family harmonium [53]. In fact, the EFH was derived as a generalization of the famous restricted Boltzmann machine [45], but we shall approach from the other end and present the RBM as a special case of the EFH.
Deriving the joint from two coupled, exponential-family conditionals.
We shall not assume the emission and posterior distributions fully general, but that they are in exponential families.
Note that this need not be the same exponential family; indeed, the several elements of (e.g.)
Thus, (functions of)
Now, the ratio of the conditionals is also the ratio of the marginals,
but we know an additional fact about this ratio: it must factor entirely into pieces that refer to at most one of
for some functions
with a shared, albeit transposed, linear transformation
and the conditional distributions are
Multiplying a conditional by the appropriate marginal yields the joint distribution:
Thus the joint takes the form of a Boltzmann distribution with negative energy
The price of trivial inference.
We can now reckon the cost at which our closed-form posterior distribution was bought.
We have traded an intractable posterior-distribution normalizer for an intractable joint-distribution normalizer.
The normalizer for the marginal distribution
…
Enforcing consistency between exponential-family emission and posterior distributions.
We saw above that when the emission and posterior distributions are both in exponential families, the natural parameters are constrained by Eq. 3.1. To simplify the presentation, we repeat the constraint here (with the vector-valued functions named alphabetically):
It is intuitive that this equation constrains the natural parameters (here,
Let all the functions be polynomials in
(Notice that we have omitted the constants from these bases.)
For appropriately shaped matrices (
holding for all values of
(3.3) | ||||
We shall only make use of the last of these, Eq. 3.3.
Now assume
where on the second line we have defined a new matrix
In a word,