Deep Learning

Student-Teacher Learning from Clean Inputs to Noisy Inputs

Computer Vision and Pattern Recognition (CVPR), 2021. (Acceptance rate = 27%)
Guanzhe Hong, Zhiyuan Mao, Xiaojun Lin, and Stanley H. Chan

Feature-based student-teacher learning, a training method that encourages the student's hidden features to mimic those of the teacher network, is empirically successful in transferring the knowledge from a pre-trained teacher network to the student network. Furthermore, recent empirical results demonstrate that, the teacher's features can boost the student network's generalization even when the student's input sample is corrupted by noise. However, there is a lack of theoretical insights into why and when this method of transferring knowledge can be successful between such heterogeneous tasks. We analyze this method theoretically using deep linear networks, and experimentally using nonlinear networks. We identify three vital factors to the success of the method: (1) whether the student is trained to zero training loss; (2) how knowledgeable the teacher is on the clean-input problem; (3) how the teacher decomposes its knowledge in its hidden features. Lack of proper control in any of the three factors leads to failure of the student-teacher learning method.

One Size Fits All: Can We Train One Denoiser for All Noise Levels?

International Conference on Machine Learning (ICML), 2020. (Acceptance rate = 21%)
Abhiram Gnansambandam and Stanley H. Chan

When training an estimator such as a neural network for tasks like image denoising, it is generally preferred to train emph{one} estimator and apply it to emph{all} noise levels. The de facto training protocol to achieve this goal is to train the estimator with noisy samples whose noise levels are uniformly distributed across the range of interest. However, why should we allocate the samples uniformly? Can we have more training samples that are less noisy, and fewer samples that are more noisy? What is the optimal distribution? How do we obtain such a distribution? The goal of this paper is to address this training sample distribution problem from a minimax risk optimization perspective. We derive a dual ascent algorithm to determine the optimal sampling distribution of which the convergence is guaranteed as long as the set of admissible estimators is closed and convex. For estimators with non-convex admissible sets such as deep neural networks, our dual formulation converges to a solution of the convex relaxation. We discuss how the algorithm can be implemented in practice. We evaluate the algorithm on linear estimators and deep networks.

ConsensusNet: Optimal Combination of Image Denoisers


Given a set of image denoisers, each having a different denoising capability, is there a provably optimal way of combining these denoisers to produce an overall better result? An answer to this question is fundamental to designing ensembles of weak estimators for complex scenes. In this paper, we present an optimal procedure leveraging deep neural networks and convex optimization. The proposed framework, called the Consensus Neural Network (CsNet), introduces three new concepts in image denoising: (1) A deep neural network to estimate the mean squared error (MSE) of denoised images without needing the ground truths; (2) A provably optimal procedure to combine the denoised outputs via convex optimization; (3) An image boosting procedure using a deep neural network to improve contrast and to recover lost details of the combined images. Experimental results show that CsNet can consistently improve denoising performance for both deterministic and neural network denoisers.


  1. Joon Hee Choi, Omar A. Elgendy and Stanley H. Chan, ‘‘Optimal Combination of Image Denoisers’’, IEEE Trans. Image Process., vol. 28, no. 8, pp. 4016-4031, Aug. 2019.
    (Supplementary Material)