POSTER: Pagoda: A Runtime System to Maximize GPU Utilization in Data Parallel Tasks with Limited Parallelism

Abstract

Massively multithreaded GPUs achieve high throughput by running thousands of threads in parallel. To fully utilize the hardware, contemporary workloads spawn work to the GPU in bulk by launching large tasks, where each task is a kernel that contains thousands of threads that occupy the entire GPU.

Publication
In 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT)
Tsung Tai Yeh
Tsung Tai Yeh
PhD Graduate, 2020.
Tim Rogers
Tim Rogers
Associate Professor of ECE