Analyzing Machine Learning Workloads Using a Detailed GPU Simulator

Abstract

Machine learning (ML) has recently emerged as an important application driving future architecture design. Traditionally, architecture research has used detailed simulators to model and measure the impact of proposed changes. However, current open-source, publicly available simulators lack support for running a full ML stack like PyTorch. High-confidence, cycle-accurate simulations are crucial for architecture research and without them, it is difficult to rapidly prototype new ideas. In this paper, we describe changes we made to GPGPU-Sim, a popular, widely used GPU simulator, to run ML applications that use cuDNN and PyTorch, two widely used frameworks for running Deep Neural Networks (DNNs). This work has the potential to enable significant microarchitectural research into GPUs for DNNs. Our results show that the modified simulator, which has been made publicly available with this paper 1 Source code available at https://github.com/gpgpu-sim/gpgpu-sim_distribution (dev branch), provides execution time results within 18% of real hardware. We further use it to study other ML workloads and demonstrate how the simulator identifies opportunities for architectural optimization that prior tools are unable to provide.

Publication
In 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
Mengchi Zhang
Mengchi Zhang
PhD Graduate, 2022.
Tim Rogers
Tim Rogers
Associate Professor of ECE