Task 005: Neural Fabrics

Event Date: June 23, 2022
Time: 11:00 am (ET) / 8:00am (PT)
Priority: No
College Calendar: Show
Anupreetham, Arizona State University
High Throughput FPGA-based Object Detection through Deeply Pipelined CNN and sort-less Non-Maximum Suppression
Abstract: Computer vision systems use object detectors coupled with deep learning algorithms to perform crucial tasks such as smart surveillance and autonomous driving. Single-shot multibox detectors (SSD) coupled with a compact CNN-based feature extractor such as MobileNet-V1 can efficiently detect, classify and localize various objects in an input image with high detection accuracy. SSD-based feature extractors generate bounding boxes around the identified objects along with their confidence scores. A subsequent non-maximum suppression (NMS) module removes any redundant boxes from the final detection. Traditionally these two modules work in a sequential manner, where the NMS module needs to wait for all box predictions to be produced before processing them. This results in significant latency overhead and throughput degradation.
 
In this work, we present a pipelined version of our novel NMS algorithm that eliminates the sequential dependencies. Our novel NMS algorithm allows us to implement a fully streamlined end-to-end FPGA accelerator for low-latency SSD-MobileNet-V1 object detection. As a result our novel NMS algorithm adds no latency overhead to the SSD-MobileNet-V1 convolution layers. Our end-to-end object detection system is implemented on an Intel Stratix 10 FPGA. This implementation is an extension to our previous work. This implementation runs at a maximum operating frequency of 400 MHz, with a throughput of 2167 frames-per-second and an end-to-end batch-1 latency of 2.13 ms. Our new pipelined NMS system achieves 3.56x higher throughput and 1.12x lower latency compared to our previous implementation. Our new pipelined NMS system achieves a 5.28x higher throughput and 5x lower latency compared to the only FPGA based ML-perf submission on SSD-based object detection systems.
 
Bio: Anupreetham is currently a PhD candidate at the School of Electrical, Computer and Energy Engineering at Arizona State University, supervised by Prof. Jae-sun Seo. He holds a MS degree in Computer Engineering from Arizona State University, and BTech degree in Electronics and Communication Engineering from National Institute of Technology Karnataka, India. He currently works as a Graduate research associate at Seo Lab in ASU. His research interests include FPGA design for real-time machine learning applications and spiking neural network hardware implementation.