ECE 69500 - Systems for AI and Large Language Models

Note:

This course previously ran as ECE 69500, Machine Learning for Cloud Computing.

Course Details

Lecture Hours: 3 Credits: 3

Areas of Specialization:

  • Computer Engineering

Normally Offered:

Each Fall

Campus/Online:

On-campus only

Requisites:

ECE 30010, ECE 56500

Requisites by Topic:

A broad and strong foundation in undergraduate computer systems, computer architecture, and AI, and good programming skills.

Catalog Description:

The objective of this course is to facilitate systems research in AI and LLMs. Today, AI and LLMs are driving transformative applications ranging from conversational agents and scientific discovery to software development and decision support. However, training, serving, and optimizing systems for AI and LLMs in the real world introduce significant challenges in scalability, efficiency, and system design. This course will explore the latest developments in systems design for AI and LLMs, considering emerging model architectures, training techniques, and large-scale deployment challenges. We will also examine system-level approaches for improving performance, efficiency, and resource utilization. The course will engage students in the discussion of papers from recent conferences that focus on (a) building scalable systems for training and serving foundation models, and (b) developing optimization techniques across the software-hardware stack to enable real-world AI applications.

Required Text(s):

None.

Recommended Text(s):

None.

Learning Outcomes

A student who successfully fulfills the course requirements will have demonstrated an ability to:

  • Explain the fundamental principles and latest developments in systems design specifically tailored for AI and LLMs.
  • List the technical challenges associated with the scalability, efficiency, and real-world deployment of foundation models.
  • List various optimization techniques across the software-hardware stack to enhance AI application performance.
  • Describe system-level approaches for improving resource utilization and computational efficiency during model training and serving.
  • Explain emerging model architectures and their requirements for large-scale system infrastructure.
  • Analyze and discuss cutting-edge research from recent systems and AI publications.

Lecture Outline:

Week Week
1 Introduction and course logistics
2 Benchmarking
3 Benchmarking and scheduling
4 Scheduling and workload modeling
5 Workload modeling
6 ML
7 Edge computing and LLM
8 LLM
9 Energy and sustainability
10 Sustainability
11 Resource management; project presentations
12 Resource management and autonomous systems
13 Autonomous systems and architecture
14 Thanksgiving Break; no class
15 Final project presentation

Assessment Method:

Projects, presentations (3/2026)