Chapter 10 Learning Energy-Based Models
One of the basic problems we have been grappling with in fitting generative models to data is how to make the model sufficiently expressive. For example, some of the complexity or “lumpiness” of the data distribution can be explained as the effect of marginalizing out some latent variables—as in a mixture of Gaussians. As we have seen, GMMs are not sufficient to model (e.g.) natural images, so we need to introduce more complexity. Latent-variable models like the VAE attempt to push the remaining complexity into the mean (or other parameters) of the emission distribution, by making it (or them) a deep neural-network function of the latent variables. Normalizing flows likewise map simply-distributed latent variables into variables with more complicated distributions, although they treat the output of the neural network itself as the random variable of interest (we don’t bother to add a little Gaussian noise)—but at the price that the network must be invertible.
An alternative to all of these is to model the unnormalized distribution of observed variables, or equivalently, the energy:
The advantage is that