POSTER: Pagoda: A Runtime System to Maximize GPU Utilization in Data Parallel Tasks with Limited Parallelism
Tsung Tai Yeh,
Amit Sabne,
Putt Sakdhnagol,
Rudolf Eigenmann,
Tim Rogers
September, 2016
Abstract
Massively multithreaded GPUs achieve high throughput by running thousands of threads in parallel. To fully utilize the hardware, contemporary workloads spawn work to the GPU in bulk by launching large tasks, where each task is a kernel that contains thousands of threads that occupy the entire GPU.
Publication
In 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT)
Tsung Tai Yeh
PhD Graduate, 2020.
Associate Professor of ECE