Bibliography

  • [1] Anthony J. Bell and Terrence J. Sejnowski. An Information-Maximization Approach to Blind Separation and Blind Deconvolution. Neural Computation, 7(6):1129–1159, nov 1995.
  • [2] Chrisopher M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.
  • [3] J.F. Cardoso. Infomax and maximum likelihood for blind source separation. IEEE Signal Processing Letters, 4(4):112–114, 1997.
  • [4] Peter Dayan, Geoffrey E. Hinton, Radford M. Neal, and Richard S. Zemel. The Helmholtz machine. Neural computation, 7(5):889–904, 1995.
  • [5] A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B (Methodological), 39(1):1–38, 1977.
  • [6] Laurent Dinh, David Krueger, and Yoshua Bengio. NICE: Non-linear independent components estimation. International Conference on Learning Representations, ICLR - Workshop Track Proceedings, 1(2):1–13, 2015.
  • [7] Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using Real NVP. In International Conference on Learning Representations, 2017.
  • [8] R. A. Fisher. On the Mathematical Foundations of Theoretical Statistics. Philosophical Transactions of the Royal Society A, CCXXII:309–368, 1922.
  • [9] Michael U. Gutmann and Aapo Hyvärinen. Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. Journal of Machine Learning Research, 13:307–361, 2012.
  • [10] Per Christian Hansen, James G. Nagy, and Dianne P. O’Leary. Deblurring Images: Matrices, Spectra, and Filtering. SIAM, 2006.
  • [11] Michael Hartl. The Tau Manifesto. Accessed: 2022-05-09.
  • [12] John Hertz, Anders Krogh, and Richard G. Palmer. Introduction to the Theory of Neural Computation. Westview Press, 1991.
  • [13] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence. Neural Computation, 14:1771–1800, 2002.
  • [14] Geoffrey E. Hinton and Andre Brown. Spiking Boltzmann Machines. Advances in Neural Information Processing Systems 12: Proceedings of the 1999 Conference, 12, 2000.
  • [15] Geoffrey E. Hinton and Zoubin Ghahramani. Generative models for discovering sparse distributed representations. Philosophical Transactions of the Royal Society B: Biological Sciences, 352(1358):1177, 1997.
  • [16] Geoffrey E. Hinton and Richard S. Zemel. Autoencoders, Minimum Description Length and Helmholtz Free Energy. In Advances in Neural Information Processing Systems 6: Proceedings of the 1993 Conference, pages 3–10, 1994.
  • [17] Matthew D. Hoffman and Matthew J. Johnson. ELBO surgery: yet another way to carve up the variational evidence lower bound. Advances in Neural Information Processing Systems (NIPS), pages 1–4, 2016.
  • [18] Aapo Hyvärinen. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6:695–709, 2006.
  • [19] Eric Jang, Shixiang Gu, and Ben Poole. Categorical Reparameterization with Gumbel-Softmax. In International Conference on Learning Representations, pages 1–12, 2017.
  • [20] E.T. Jaynes. Probability Theory: The Logic of Science. Cambridge University Press, ebook edition, 2003.
  • [21] Michael I. Jordan. Why the logistic function? A tutorial discussion on probabilities and neural networks. Technical report, 1995.
  • [22] Michael I. Jordan. An Introduction to Probabilistic Graphical Models. Unpublishewd textbook, 2003.
  • [23] Zahra Kadkhodaie and Eero P. Simoncelli. Stochastic Solutions for Linear Inverse Problems using the Prior Implicit in a Denoiser. NeurIPS, (NeurIPS), 2021.
  • [24] Diederik P. Kingma and Prafulla Dhariwal. Glow: Generative Flow with Invertible 1x1 Convolutions. In Neural Information Processing Systems, pages 1–15, 2018.
  • [25] Diederik P. Kingma and Ruiqi Gao. Understanding Diffusion Objectives as the ELBO with Simple Data Augmentation. 2023.
  • [26] Diederik P. Kingma, Tim Salimans, Ben Poole, and Jonathan Ho. Variational Diffusion Models. In Advances in Neural Information Processing Systems, volume 26, pages 21696–21707, 2021.
  • [27] Diederik P. Kingma and Max Welling. Auto-Encoding Variational Bayes. In International Conference on Learning Representations, pages 1–14, 2014.
  • [28] Friso H. Kingma, Pieter Abbeel, and Jonathan Ho. Bit-Swap: Recursive bits-back coding for lossless compression with hierarchical latent variables. 36th International Conference on Machine Learning, ICML 2019, 2019-June:5925–5940, 2019.
  • [29] Michael S. Lewicki and Bruno A. Olshausen. Probabilistic framework for the adaptation and comparison of image codes. Journal of the Optical Society of America A, 16(7):1587–1601, 1999.
  • [30] Michael S. Lewicki and Terrence J. Sejnowski. Learning overcomplete representations. Neural Computation, 12(2):337–65, feb 2000.
  • [31] Michael S. Lewicki and Terrence J Sejnowski. Learning Overcomplete Representations. Neural Computation, 12:337–365, 2000.
  • [32] Chris J. Maddison, Andriy Mnih, and Yee Whye Teh. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables. In International Conference on Learning Representations, pages 1–20, 2017.
  • [33] Shakir Mohamed, Mihaela Rosca, Michael Figurnov, and Andriy Mnih. Monte Carlo Gradient Estimation in Machine Learning. Journal of Machine Learning Research, 21:1–62, 2020.
  • [34] Radford M. Neal and Geoffrey E. Hinton. A view of the EM algorithm that justifies incremental, sparse, and other variants. Learning in graphical models, 1998.
  • [35] Bruno A. Olshausen and DJ Field. Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Research, 37(23):3311–3325, 1997.
  • [36] Manfred Opper and Cedric Archambeau. The Variational Gaussian Approximation Revisited. Neural Computation, 21:786–792, 2009.
  • [37] John Paisley, David M. Blei, and Michael I. Jordan. Variational Bayesian Inference with Stochastic Search. In Proceedings of the 29th International Conference on Machine Learning (ICML-12), 2012.
  • [38] L . R . Pericchi and A . F . M . Smith. Exact and Approximate Posterior Moments for a Normal Location Parameter. Journal of the Royal Statistical Society, 54(3):793–804, 1992.
  • [39] W.V.O. Quine. Two Dogmas of Empiricism. The Philosophical Review, 60:20–43, 1951.
  • [40] Danilo Jimenez Rezende and Shakir Mohamed. Variational Inference with Normalizing Flows. In International Conference in Machine Learning, volume 37, 2015.
  • [41] Danilo Jimenez Rezende, Shakir Mohamed, and D Wierstra. Stochastic backpropagation and approximate inference in deep generative models. International Conference in Machine Learning, 32:1278–1286, 2014.
  • [42] Herbert Robbins. An Empirical Bayes Approach to Statistics. In Berkeley Symp. on Math. Statist. and Prob., pages 157–163, 1956.
  • [43] Steffen Schneider, Alexei Baevski, Ronan Collobert, and Michael Auli. Wav2vec: Unsupervised Pre-Training for Speech Recognition. In Interspeech, pages 3465–3469, 2019.
  • [44] Eero P. Simoncelli and Bruno A. Olshausen. Natural Image Statistics and Neural Representations. Annual Review of Neuroscience, 24:1193–216, 2001.
  • [45] P Smolensky. Information processing in dynamical systems: Foundations of harmony theory. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition, chapter 6, pages 194–281. MIT Press, 1986.
  • [46] Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. 32nd International Conference on Machine Learning, ICML 2015, 3:2246–2255, 2015.
  • [47] Richard M Soland. Bayesian Analysis of the Weibull Process With Unknown Scale and Shape Parameters. IEEE Transactions on Reliability, R-18(4):181–184, 1969.
  • [48] Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. Advances in Neural Information Processing Systems, 32(NeurIPS), 2019.
  • [49] D.M. Titterington, A.F.M. Smith, and U.E. Makov. Statistical Analysis of Finite Mixture Distributions. Wiley, 1985.
  • [50] James Townsend, Thomas Bird, and David Barber. Practical lossless compression with latent variables using bits back coding. In 7th International Conference on Learning Representations, ICLR 2019, number 1997, pages 1–13, 2019.
  • [51] Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation Learning with Contrastive Predictive Coding. 2018.
  • [52] Pascal Vincent. A connection between score matching and denoising autoencoders. Neural Computation, 23:1661–1674, 2011.
  • [53] Max Welling, Michal Rosen-Zvi, and Geoffrey E. Hinton. Exponential Family Harmoniums with an Application to Information Retrieval. In Advances in Neural Information Processing Systems 17: Proceedings of the 2004 Conference, pages 1481–1488., 2004.