Task 2777.002 Norm, Stability and Generalization in Deep Networks

Event Date: December 13, 2018
Time: 2:00 pm EST
Priority: No
College Calendar: Show
Andrzej Banburski, MIT

Title: Norm, Stability and Generalization in Deep Networks

Presenter: Andrzej Banburski, MIT

Abstract: A main puzzle of deep neural networks (DNNs) revolves around the apparent absence of “over-fitting”. This is surprising because of the large capacity demonstrated by DNNs to fit randomly labeled data and the absence of explicit regularization. Recent results by Srebro et al. provide a satisfying solution of the puzzle for linear networks used in binary classification. They prove that minimization of loss functions such as the logistic, the cross-entropy and the exp-loss yields asymptotic, “slow” convergence to the maximum margin solution for linearly separable datasets, independently of the initial conditions. We discuss the prospects of extending these results to deep networks using the theory of dynamical systems and show that the concepts of stability and normalization play a crucial role. This analysis of normalization leads to a criterion for comparing different minima in deep networks and allows us to find a linear relationship between normalized training and test losses, leading to very tight generalization bounds.

Bio: Andrzej Banburski received his Ph.D. in Theoretical Physics from the Perimeter Institute for Theoretical Physics and University of Waterloo in 2017. He is currently a postdoc in the Center for Brains, Minds+Machines at MIT and works on C-BRIC research with Prof. Tomaso Poggio. His recent interests lie in theoretical understanding of deep learning and in trying to make steps towards AI capable of symbolic and mathematical reasoning.