Institute of Chips and AI
(Invitation Only)
Tuesday, November 19, 2024
8:30 AM - 5:30 PM
Computer History Museum
1401 N Shoreline Blvd
Mountain View, California 94043
Agenda
Click this link to view the agenda.
Poster List
Processing-in-DRAM to Alleviate the Memory Bottleneck
E. Berscheid, S. Roy, A. Raghunathan
We explore Computing-in-DRAM to alleviate the growing memory bandwidth bottleneck in AI accelerators. We have proposed different in-array and near-array processing architectures and have applied them to various neural workloads including computer vision, natural language processing and recommendation systems. This includes PIM-DRAM, a design that realizes massively parallel multiplication within DRAM arrays with near-array logic for accumulation and other operations, and STaR-NMP, a method to optimize the efficiency of Computing-in-DRAM for recommendation systems.
ECO: Designing Energy-Efficient Multimodal Cognitive Systems Using Efficient Context Handling
A. Das, S. Ghosh, A. Raha, V. Raghunathan
Multimodal Artificial Intelligence (MMAI) offers remarkable potential for creating cognitive systems capable of simultaneously analyzing and interpreting data from multiple sensory modalities. However, deploying MMAI on resource-limited edge platforms can be a difficult problem due to various challenges stemming from high computational and memory demands, limited communication bandwidth, real-time processing requirements, and the complexities of multimodal data fusion. To address these issues, we introduce ECO, an innovative solution that harnesses the underlying advantages of combining MMAI and edge computing. ECO features efficient context handling through modality-aware Accuracy-Efficiency (AE) knobs, that extend beyond multimodal sensors to individual subsystems within the edge device. ECO investigates intermodal and inter-subsystem interaction to drive system-level AE trade-offs through synergistic approximations efficiently. Towards that end, ECO is the first energy-accuracy scalable cognitive system for efficient multimodal inference at the edge. In this work, we present an in-depth case study centered around a multimodal system employing RGB and depth sensors for image segmentation. Our system, ECO, demonstrates significant energy savings -- 1.8X on the edge device and 1.7X on the edge server -- with an imperceptible application-level accuracy loss of less than 0.01%. Furthermore, ECO outperforms single-modality optimizations, achieving 1.2X and 1.8X more energy efficiency on the edge compared to RGB-only and Depth-only approaches, respectively, for similar levels of accuracy loss.
CHEETA: CMOS+MRAM Hardware for Energy-EfficienT AI (ME Commons)
K. Roy, A. Raghunathan, S. Gupta
This poster presents the CHEETA project, which leverages spin-based memory elements and material-device-array co-design to build a robust, energy-efficient in-memory computing (IMC) accelerator. The work includes a ROM-embedded RAM approach to enable efficient computation of transcendental functions, such as exponential and tanh, for advanced AI applications.
In Memory Computing based AI accelerators
D. Sharma, M. Ali, Gaurav K, K. Roy
This poster showcases custom integrated circuits designed for accelerating artificial (ANN) and spiking (SNN) neural networks using in-memory computing principles. Fabricated in TSMC 65nm technology, these circuits combine digital and analog compute-in-memory methods to perform efficient matrix-vector multiplications, a core operation in neural networks.
Accel-Sim: Modeling GPUs with accuracy and Speed
M. Khairy, Z. Shen, T. Aamodt, C. Avalos, A. Alawneh, W. An, A. Barnes, C. Bose, N. Kang, Y. Liu, J. Pan, F. Shen, T. Rogers
TBD
Neuro-inspired Autonomous Navigation: From Sensors to Algorithms to Hardware
A. Kosta, M. Nagaraj, K. Roy
The research underscores the need for co-design across three critical levels -- sensors, algorithms, and hardware -- for achieving energy-efficient AI. By integrating sensor fusion, hybrid SNN-ANN algorithms, and advanced Compute-in-Memory hardware, it addresses the energy gap between biological and artificial intelligence. This holistic approach demonstrates that optimizing all three levels together is essential for enabling efficient, real-time autonomous navigation in resource-constrained edge AI systems.
Machine learning-based RTL power estimation and design space exploration for SoCs
S. Pandit, S. Dey, A. Raghunathan
Design space exploration (DSE) for RTL-level System-on-Chip (SoC) designs is often slowed down by the traditional RTL simulation and power estimation tools, especially as designs become more complex. To address this issue, we have developed a machine learning based framework that predicts per-cycle, per-block power consumption 40 times faster than commercial tools, with less than 5% error, using fewer than 0.01% of RTL signals selected as power proxies. To eliminate the slowdowns caused by RTL simulation, our framework trains a sequence-to-sequence model to predict the activity of these power proxies directly from instruction traces. This approach enables rapid and accurate power estimation across different programs and SoC configurations, significantly accelerating the DSE process.
HW/SW Co-design for ADC-Less Compute in Memory Accelerator
U. Saxena, T. Sharma, S. Negi, K. He, K. Roy
In this work, we provide an algorithm hardware co-design framework to develop ultra efficient compute in memory accelerators which are not bottlenecked by ADCs. We propose a binary/ternary partial sum quantization algorithm, a hybrid ADCLess analog/digital compute in memory macro and an architecture exploration framework to develop compute in memory accelerators that provide orders of magnitude improvement over baselines.
THERMAI: TCAD Model Informed Thermal Analysis of Circuits Using GenAI
S. Chandra, S. Chowdhury, K. Roy
High-performance processors face rising power density challenges, necessitating efficient thermal management. Traditional temperature estimation methods are accurate but computationally expensive and struggle with modern circuit complexity. This study introduces 'ThermAl', a novel AI-accelerated model combining generative AI to predict heat distribution in bulk silicon at the transistor level. Using finite element-based tools like COMSOL, a dataset of detailed heat distribution maps is created to learn the transient behavior of heat flow dynamics. The model, trained on diverse circuit layouts, achieves high accuracy with an RMSE of 0.76°C and effectively extrapolates thermal behavior for larger circuits. Compared to commercial FEM tools, 'ThermAl' offers ~200x faster temperature prediction and scales efficiently to complex chip designs, enabling precise and rapid thermal mapping to address growing power density challenges.
How Far Are We From Zero-Shot RTL Code Generation Using LLMs?
T. Sharma, K. Roy
TBD
Enhancing Neural Networks with Ferroelectric Devices
E. Yu, Gaurav K, U. Saxena, K. Roy
This poster demonstrates how ferroelectric devices can enhance neural networks, focusing on energy efficiency and scalability.