Task 005 - Neuromorphic Fabrics
|Event Date:||February 27, 2020|
|School or Program:||Electrical and Computer Engineering
FPGA-based Training Accelerator for Modern CNNs Featuring High Bandwidth Memory
Abstract: Training of Convolutional Neural Networks (CNNs) is intensive in both memory and computation. Low bandwidth of traditional off-chip memories is one of the major bottlenecks in implementing efficient training accelerator for CNN algorithms on FPGAs. However, the large bandwidth provided by High Bandwidth Memory (HBM) unlocks opportunities to improve the performance at the system level. In this talk, we present a fixed-point CNN training accelerator using HBM supporting various training operations including dropout, and residual connections. The accelerator employs a novel architecture to exploit the sparsity of the dilated convolutions and efficiently access on-chip buffer weights in both non-transpose/transpose directions. We analyze the impact of HBM on training tasks and provide a comprehensive comparison with DDR3 and discuss the strategies to efficiently use the HBM features for better performance. Finally, we evaluate the performance of training CNNs for CIFAR-10 classification on Intel Stratix-10 MX devices using HBM communication.
Bio: Shreyas Kolala Venkataramanaiah is a second-year Ph.D. candidate at Arizona State University under the supervision of Prof. Jae-Sun Seo. His research interests include deep/spiking neural network hardware, energy-efficient hardware design for neural network inference/training, FPGA accelerators for deep learning. He’s currently working on an automatic compiler-based FPGA accelerator design for CNN training using High Bandwidth Memory (HBM).