Theme 2: Neuromorphic Fabrics

Event Date: March 30, 2023
Time: 11:00 am (ET) / 8:00 am (PT)
Priority: No
College Calendar: Show
Shrihari Sridharan, Purdue University
X-Former: In-Memory Acceleration of Transformers
Abstract:
 
Transformers have achieved great success in a wide variety of natural language processing (NLP) tasks due to the attention mechanism, which assigns an importance score for every word relative to other words in a sequence. However, these models are very large, often reaching hundreds of billions of parameters, and therefore require a large number of DRAM accesses. Hence, traditional deep neural network (DNN) accelerators such as GPUs and TPUs face limitations in processing Transformers efficiently. In-memory accelerators based on non-volatile memory promise to be an effective solution to this challenge, since they provide high storage density while performing massively parallel matrix vector multiplications within memory arrays. However, attention score computations, which are frequently used in Transformers (unlike CNNs and RNNs), require matrix vector multiplications (MVM)  where both operands change dynamically for each input. As a result, conventional NVM-based accelerators incur high write latency and write energy when used for Transformers, and further suffer from the low endurance of most NVM technologies. 
 
To address these challenges, we present X-Former, a hybrid in-memory hardware accelerator that consists of both NVM and CMOS processing elements to execute transformer workloads efficiently. To improve the hardware utilization of X-Former, we also propose a sequence blocking dataflow, which overlaps the computations of the two processing elements and reduces execution time. Across several benchmarks, we show that X-Former achieves upto 85x and 7.5x improvements in latency and energy over a NVIDIA GeForce GTX 1060 GPU and upto 10.7x and 4.6x improvements in latency and energy over a state-of-the-art in-memory NVM accelerator.
 
Bio:
 
Shrihari Sridharan received his B.S. degree in Electrical and Computer Engineering from Purdue University, West Lafayette, IN, USA in 2018. He was a recipient of the Summer Undergraduate Research Fellowship (SURF) to work as a research intern at Nanoelectronics Research Laboratory, Purdue University in 2016. Since Fall 2018, he has been pursuing his Ph.D. at the Center for Brain-inspired Computing(C-BRIC), Purdue University. His research interests include designing efficient algorithms and architectures for deep learning and other cognitive systems. He is also the recipient of the DAC Young Fellowship award in 2020.