ECE 69500 - Systems for AI and Large Language Models
Note:
This course previously ran as ECE 69500, Machine Learning for Cloud Computing.
Course Details
Lecture Hours: 3 Credits: 3
Areas of Specialization:
- Computer Engineering
Normally Offered:
Each Fall
Campus/Online:
On-campus only
Requisites:
ECE 30010, ECE 56500
Requisites by Topic:
A broad and strong foundation in undergraduate computer systems, computer architecture, and AI, and good programming skills.
Catalog Description:
The objective of this course is to facilitate systems research in AI and LLMs. Today, AI and LLMs are driving transformative applications ranging from conversational agents and scientific discovery to software development and decision support. However, training, serving, and optimizing systems for AI and LLMs in the real world introduce significant challenges in scalability, efficiency, and system design. This course will explore the latest developments in systems design for AI and LLMs, considering emerging model architectures, training techniques, and large-scale deployment challenges. We will also examine system-level approaches for improving performance, efficiency, and resource utilization. The course will engage students in the discussion of papers from recent conferences that focus on (a) building scalable systems for training and serving foundation models, and (b) developing optimization techniques across the software-hardware stack to enable real-world AI applications.
Required Text(s):
None.
Recommended Text(s):
None.
Learning Outcomes
A student who successfully fulfills the course requirements will have demonstrated an ability to:
- Explain the fundamental principles and latest developments in systems design specifically tailored for AI and LLMs.
- List the technical challenges associated with the scalability, efficiency, and real-world deployment of foundation models.
- List various optimization techniques across the software-hardware stack to enhance AI application performance.
- Describe system-level approaches for improving resource utilization and computational efficiency during model training and serving.
- Explain emerging model architectures and their requirements for large-scale system infrastructure.
- Analyze and discuss cutting-edge research from recent systems and AI publications.
Lecture Outline:
| Week | Week |
|---|---|
| 1 | Introduction and course logistics |
| 2 | Benchmarking |
| 3 | Benchmarking and scheduling |
| 4 | Scheduling and workload modeling |
| 5 | Workload modeling |
| 6 | ML |
| 7 | Edge computing and LLM |
| 8 | LLM |
| 9 | Energy and sustainability |
| 10 | Sustainability |
| 11 | Resource management; project presentations |
| 12 | Resource management and autonomous systems |
| 13 | Autonomous systems and architecture |
| 14 | Thanksgiving Break; no class |
| 15 | Final project presentation |
Assessment Method:
Projects, presentations (3/2026)