7.1 The Gaussian mixture model and -means
We return once again to the GMM, but this time, armed with the EM algorithm, we finally derive the learning rules. Since the entropy The joint cross entropy for a Gaussian mixture model is
To enforce the fact that the prior probabilities sum to one, we can augment the loss with a Lagrangian term:
The M step.
We take the derivatives in turn. First the mixing proportions:
then the class-conditional means:
and the class-conditional covariances:
Whether in EM or under a fully observed model, the optimal parameters are intuitive.
The optimal mixing proportion, emission mean, and emission covariance for class
For example, when the class labels are observed, the denominator in the equation for the optimal mean becomes just the number of times class
7.1.1 -means
In Section 2.1.1, we saw what happens to the posterior of the GMM when all classes use the same covariance matrix,
It is not hard to see that in the limit of infinite precision, this quantity goes to zero unless
|
|
|
|
This algorithm, which pre-existed EM for the GMM, is known as