ECE 69500 - AI Hardware
Course Details
Lecture Hours: 3 Credits: 3
Areas of Specialization:
- VLSI and Circuit Design
Counts as:
Normally Offered:
Each Spring
Campus/Online:
On-campus only
Requisites by Topic:
Undergraduate level understanding of traditional computer architecture, circuits and basic linear algebra. Some exposure to deep learning algorithms is beneficial but not required.
Catalog Description:
This course offers an in-depth exploration of how advances in deep learning algorithms have driven innovations in hardware design, and vice versa. It covers advancements in traditional computing architectures, specialized machine learning accelerators, and novel paradigms like compute-in-memory and neuromorphic computing. Students will develop the skills to formulate algorithm-hardware co-design techniques for designing/optimizing hardware tailored for deep learning algorithms.
Required Text(s):
None.
Recommended Text(s):
None.
Learning Outcomes
A student who successfully fulfills the course requirements will have demonstrated an ability to:
- Describe the different types of deep learning algorithms.
- Explain the need for domain specific accelerators compared to traditional computing paradigms.
- Design deep learning accelerators including GPUs, TPUs, and NPUs.
- Explore compute-in-memory based AI hardware with different memory technologies, including emerging non-volatile memories (NVMs).
- Identify research challenges related to the rapid growth of deep learning hardware and potential solutions.
Lecture Outline:
Week | Major Topics |
---|---|
1 | Basics of ML Algorithms: Course Overview, Applications of ML, Cloud vs TinyML, Training vs Inference, Types of Models: Conv, FC, LSTMs, Recommendation, Transformers, Point cloud, GNNs, large language models, state space models, diffusion models; Inside the neural networks (math): different operations and architecture, terms such as bias, variance; Major compute macros for neural workloads (Matrix-Vector/Matrix-Matrix/Dot- Product) |
2 | Hardware for ML: Need for Domain specific accelerators: End of Dennard's Scaling, Slowdown of Moore's law, Increasing Power; Overview of CPU, GPU for training and inference; Introduction to TPU (systolic arrays), AI hardware startups |
3 | Quantization and Sparsity: Different number formats - Double Precision to FP4, posit; Types of sparsity - structured, unstructured, and pruning at channel, layer level; Paper Discussions: EIE, Deep Compression |
4 | Specialization techniques for ML hardware: Roofline analysis and how to compare different hardware (metrics to look for); Characterization/Performance analysis, identifying bottlenecks; Algo-hardware co-design techniques (BitFusion, Spatten, etc.); Compilers for ML Accelerator: Triton, TVM, CUDA, etc. |
5 | Dataflow/Scheduling: Eyeriss, Timeloop, Maestro, Sparseloop, etc.; Different types of parallelism - Pipeline, tensor, data; Flash-Attention 1,2,3 |
6 | Computing in SRAM: Introduction to Compute-in/near-Memory: Motivation and classification; Basic Functionality and Overview, SRAM modifications, Analog vs Digital; Optional: Simulate an SRAM array to perform compute in memory; Architecture, System vs Macro, tools for accuracy, limitations; Optional: Take one of the chips to circulate in class |
7 | Computing in eNVMs: Introduction to Post-CMOS devices, trade-offs; Techniques to overcome the device limitations for computing; Accuracy Modelling - NeuroSim, GenieX |
8 | CiM Advancements and Trends: Softmax in CiM; Floating Point implementation; Additional Readings |
9 | Computing in DRAMs: DRAM technology - DDR, HBM, evolution with time; Different ways of computing in DRAM - in-bank, near bank, near channel, ranks, vault, etc.; Introduction to KV cache problem in large models |
10 | Other in-memory computing techniques: CXL-based memory; Near-memory computing solutions. E.g. Recommender Systems |
11 | Neuromorphic Computing; Introduction to SNNs |
12 | Neuromorphic Computing: Hardware strategies to enable event-driven computation |
13 | Open Discussion on trends and future of ML hardware: Training accelerators and requirements; NoC/Chiplets; Papers from HotChips |
14 | Open Discussion on trends and future of ML hardware/Student Presentations |
15 | Student Presentations |
Assessment Method:
Quizzes, projects (12/2024)