6.1 Introduction
[[
We are concerned with learning in generative models under two circumstances, “supervised” and “unsupervised.”
The paradigm case of each is the illustrated in Fig. LABEL:fig:clustering.
In Fig. LABEL:subfig:labeledClusters, each datum (
Density estimation.
Let us begin with an even simpler data set to model, Fig. LABEL:subfig:cluster.
The data look to be distributed normally, so it would be sensible simply to let
Differentiating with respect to
where in the final equality we approximate the expectation under the (unavailable) data distribution with an average under (available) samples from it.
Likewise, differentiating with respect to
So far, so good. We now proceed to the dataset shown in Fig. LABEL:subfig:unlabeledClusters. Here by all appearances is a mixture of Gaussians. In Section 2.1.1 we derived the marginal distribution for the GMM, Eq. 2.10, so it seems that perhaps we can use the same procedure as for the single Gaussian. The cross-entropy loss is
We have encountered a problem.
The summation (across classes) inside the logarithm couples the parameters of the