Chapter 5 Learning Discriminative Models

The essential feature of discriminative models is that they do not attempt to model the distribution of all of the available data. Instead, they attempt to model only the distribution of one set of data conditioned on another set, pˇ(𝒙ˇ|𝒚){\check{p}\mathopen{}\mathclose{{}\left(\leavevmode\color[rgb]{.5,.5,.5}% \definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}\pgfsys@color@gray@stroke{.5% }\pgfsys@color@gray@fill{.5}\bm{\check{x}}{}\middle|\leavevmode\color[rgb]{% .5,.5,.5}\definecolor[named]{pgfstrokecolor}{rgb}{.5,.5,.5}% \pgfsys@color@gray@stroke{.5}\pgfsys@color@gray@fill{.5}\bm{y}{}}\right)}.11 1 The use of pˇ\check{p} for discriminative models will become clear in the next chapter. No attempt is made to model the distribution of 𝒀{\bm{Y}}. Consequently, the variables 𝒀{\bm{Y}} and 𝑿ˇ{\bm{\check{X}}} are often referred to as the “inputs” and “outputs,” respectively22 2 With some reservations, I have reversed the standard convention of using 𝒀{\bm{Y}} for outputs and 𝑿ˇ{\bm{\check{X}}} for inputs. The point is to emphasize that discriminative models are the Bayesian inverses of generative models. But why should the generative models use 𝑿ˇ{\bm{\check{X}}} for their “source” variables and 𝒀{\bm{Y}} for the emissions? This in turn is to match the standard conventions from control theory for, e.g., linear dynamical system; see for example Section 2.2. This tension is clearly felt in the machine-learning literature, where generative models typically introduce yet another symbol, 𝒁{\bm{Z}}, for their latent variables! .

Since we are focused on parameteric models, the critical questions are:

  1. 1.

    What parametric family of distributions shall we use for the conditional? and

  2. 2.

    What family of functions shall we use for the map from the “inputs,” 𝒀{\bm{Y}}, to the parameters of that distribution?