Part I Representation and Inference

[[We usually begin our investigations with a set of variables and (assumed) statistical dependencies among them. Note that this set may include “latent” variables, of which we have made no observations. Inference, one of the two major operations at the center of this book (the other being learning), consists of estimating the unobserved or latent variables, using a model of some sort and the observations of the non-latent—“observed” or perhaps “patent”—variables. More precisely, one infers a probability distribution over some or all of these variables, conditioned on the observed variables.

In contrast, learning consists of finding the optimal numerical values for parameters of the model. Thus inference and learning are, respectively, questions in the domains of probability and statistics [22]. Having said that, learning can be assimilated to inference (Bayesian inference), and inference can be assimilated to learning (variational inference).

In the next two chapters, we’ll assume that our models of the data are “perfect”—although we’ll still maintain the distinction between model and true generative process by using respectively pp and p^\hat{p} for their corresponding probability distributions.]]