Task 005/006 - Neuromorphic Fabrics

Event Date:	July 23, 2020
Priority:	No
School or Program:	Electrical and Computer Engineering
College Calendar:	Show

Sanchari Sen, Purdue University
Efficacy of Pruning in Ultra-Low Precision DNNs

Abstract: The enormous computational and memory demands of DNNs pose a serious challenge to their efficient deployment on resource-constrained computing platforms. Two of the most popular approaches for addressing this challenge are quantization, or reducing the precision of DNNs, and pruning, or removing neurons and connections. Quantization and pruning have been largely explored as independent approaches for improving the efficiency of DNNs. In this work, we investigate the opportunities for combining these two methods, particularly as each of them is pushed to their limits. Specifically, we explore the efficacy of pruning DNNs in the ultra-low precision regime (sub-8 bits for inference). We find that the efficacy of pruning in reducing the storage requirements of DNNs drops significantly. This is because, with decreasing weight precision values, the overhead of storing non-zero locations starts to dominate, reducing the compression ratio achieved by sparse coding schemes, and even reducing it to <1 in certain cases. We analyze the overheads and compression ratios of two popular sparse formats, namely Compressed Sparse Column (CSC) and Sparsity Map (Smap). We also propose a new format, compressed Sparsity Map (cSmap), for reducing the location overheads in the Smap format. The cSmap format is realized in our implementation by re-purposing test pattern compression methods widely used in manufacturing test. Our results across 6 state-of-the-art Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) with varying sparsity levels indicate that all three sparse formats suffer from low compression ratios in the ultra-low precision regime. However, the individual formats behave differently across sparsity and precision levels, leading to a variation in the best performing sparse format in each scenario. Based on this observation, we further propose a hybrid compression scheme that dynamically chooses between the sparse formats, at both network and layer-level granularities, for different sparsity and precision levels. For 2-bit precision DNNs, such a hybrid compression scheme improves the average compression ratio by 18.3% - 34.7% compared to homogeneous compression schemes.

Bio: Sanchari Sen received the B.Tech degree in Electronics and Electrical Communication Engineering from the Indian Institute of Technology, Kharagpur, India. She is currently pursuing PhD under the supervision of Dr. Anand Raghunathan in the School of Electrical and Computer Engineering, Purdue University, West Lafayette, Indiana. Her current research interests include algorithmic and architectural techniques for improving the efficiency and robustness of deep neural networks on different platforms. She received the Bilsland Dissertation Fellowship from Purdue University in 2019 and the Ross Fellowship award in 2015. She was also awarded the Institute Silver medal for her academic performance in IIT Kharagpur.