9.1 InfoMax ICA, revisited
Historically, this equivalence was noted first [3] for a specific model, InfoMax ICA [1], which we first encountered in Section 5.2. Consider the very simply “generative model” in which the observations are related to the “latent” variables by a square, full-rank matrix:
Substituting this relationship (cf. Eq. 9.1) into Eq. 9.3, we see that the marginal distribution of the observed variables is
where again
That is, InfoMax ICA can be implemented as density estimation in a generative model with latent variables distributed independently and cumulatively according to
|
|
But we haven’t specified
If the observations are indeed normal, then whitening them in this way would indeed render them independent (since for jointly Gaussian random variables, uncorrelatedness implies independence)—but we do not need such an elaborate procedure to arrive at this conclusion!
ICA is of interest precisely when the observations are not normal, in which case the optimal linear transformation cannot generally be stated a priori.
Critically, squashing the data with the Gaussian CDF makes the outputs blind to the higher-order correlations, and is therefore not a suitable nonlinearity in cases of interest.
In contrast, the (standard) logistic function is super-Gaussian (leptokurtotic), so InfoMax ICA with logistic outputs will generally do more than decorrelate its inputs.
This may seem remarkable, given the visually minor discrepancy between the Gaussian CDF and the logistic function (Fig. LABEL:fig:; B.A. Olshausen, personal communication).
Now we see the advantage of the generative perspective, from which this difference is more salient—and at long last, shed light on how to choose the feedforward nonlinearities,