Task 2777.004: Neural Primitives: In-memory computing primitives enabling energy-efficient implementation of spiking neural networks
|Event Date:||March 28, 2019|
|Time:||2:00 pm EST/11 am PST
Abstract: Most ANNs and SNNs require the evaluation of complex math tables, transcendental functions, high-order polynomials etc. This is often done by storing them in read-only memories (ROMs). However, storing large dedicated ROMs on-chip incur large area overhead. To mitigate this, we investigated the possibility of embedding ROMs within standard 6T and 8T-SRAMs, by simply adding an extra word-line (WL) or a source-line (SL). This enabled a single SRAM bit to store the ROM data as well as RAM data, effectively doubling the memory density, while maintaining the performance and area-efficiency of standard SRAMs. We propose a specialized neural hardware for accelerating SNNs, using a distributed computing architecture with these ROM-Embedded RAM primitives. The architecture consists of a mesh of neural cores. Each neural core consists a ROM-Embedded RAM memory unit, and a finite state machine (FSM). All computations are done locally within each neural core, while only the spike information (1-bit) is communicated among cores. The ROM-Embedded RAM stores the synaptic weights and neuron state variables in its RAM, while the ROM stores the required math tables and LUTs. The architecture exploits the event-driven nature of SNNs since the FSM only computes if there is an input spike. Further, being a distributed architecture, all cores compute in parallel, increasing the throughput of the system. Our results show up-to 1.75×, 1.95× and 1.95× improvement in energy, iso-storage area, and iso-area performance, respectively. I will also talk about an in-memory accelerator for implementing deep binary neural networks. Aggressively scaled deep binary neural networks (BNNs) are recently gaining interest among the community, since they promise state-of-the-art accuracies with a highly simplified computational framework. We show how a 10T-SRAM cell can be utilized with an analog charge sharing approach yielding an approximate convolution operation for the BNNs. A key feature of this proposal is a modular approach enabling it to be scalable for deeper networks. We highlight various trade-offs in terms of circuit complexity, speed-up and classification accuracy, and proposed various circuit techniques to ensure an accurate output, despite using low-precision, low-overhead analog-to-digital converters. We obtained energy improvements of upto 6.1×, and speed-up improvements of upto 15.8×, on a benchmark BNN architecture.
Bio: is currently a PhD student at Purdue University, working with Prof. Roy. He received his bachelors in Electrical Engineering from Indian Institute of Technology (IIT) Ropar, India. His primary research interests include enabling in-memory computations for neuromorphic systems using CMOS and beyond-CMOS memories. He is also looking into modeling and simulation of spintronic devices for applications in neuromorphic computing.