Task 008: Learning to Quantize Deep Neural Networks: A Competitive-Collaborative Approach

Event Date: December 12, 2019
Time: 2:00pm ET/ 11:00am PT
Priority: No
College Calendar: Show
Faysal Khan, Pennsylvania State University
Learning to Quantize Deep Neural Networks: A Competitive-Collaborative Approach
Quantization or discretization of neural networks has received significant attention lately due to their portability to dedicated AI accelerators and FPGAs that can benefit from their reduced model size and compute time. Most of the prior works so far mainly focused on minimizing quantization loss of the parameters, which may not necessarily imply a reasonable generalization accuracy for the network. Moreover, most of them assume constant bit discretization of the weights and activations for all the layers. We argue that this may not be the optimal state representation, i.e. certain layers may require higher bit precision to preserve the accuracy while others can operate at significantly lower precision levels. In this talk, we propose an iterative accuracy-driven learning framework of competitive-collaborative quantization (CCQ) to gradually adapt the bit-precision of each individual layer. Orthogonal to prior quantization policies working with full precision for the first and last layers of the network, CCQ offers layer-wise competition for any target quantization policy with holistic layer fine-tuning to recover accuracy, where the state-of-the-art networks can be entirely quantized without any significant accuracy degradation.
Md Fahim Faysal Khan completed his B.Sc. in Electrical and Electronic Engineering (EEE) from Bangladesh University of Engineering and Technology (BUET) in 2017. He joined the BUET VLSI lab in July, 2017 as a research engineer where he worked in designing a “Constant Current Low Power High Frequency LED Driver” and successfully completed its tape-out. He is currently pursuing his Ph.D. degree at the Pennsylvania State University under the supervision of Prof. Vijaykrishnan Narayanan. His current research focuses on finding efficient learning algorithms and hardware architectures for the present state-of-the-art AI implementations.