3.5 General inference algorithms
[[So far, inference has looked like a (possibly iterative) application of Bayes’s theorem. The most complicated structure we’ve considered is the binary tree underlying the HMM and the state-space models. Now we’d like to generalize to more complicated directed and undirected graphs.
The basic intuition is that we can construct efficient algorithms for trees and tree-like graphs. Therefore, the basis of our derivation of the very general junction-tree algorithm will be to convert non-tree graphs into something tree-like. We may have to bite the bullet and accept large cliques, which are bad from a computation perspective…. The alternatives are approximate inference (loopy BP, variational inference, sampling methods).
At the heart of inference is normalization. This is the “hard” part of Bayes’s rule, because it may involve an intractable integral; or it may involve summing over all the configurations of the random variables, the number of the former being exponential in the number of the latter.
|
|
The other setting in which we had to compute normalizers was undirected graphical models, since the product of the potential functions on an undirected graph is an unnormalized distribution. But this too can be assimilated to inference with Bayes’s theorem. To see this, consider the undirected graphical model in Fig. 3.1A, for which the joint distribution is
for some normalizer
Now consider the graphical model in Fig. 3.1B, which we assert to be normalized. Thus
In this graph,
then
The last line follows from summing both sides over all configurations of
More generally, the product of potentials for any undirected graphical model with nodes