SIMR: Single Instruction Multiple Request Processing for Energy-Efficient Data Center Microservices

Abstract

Contemporary data center servers process thousands of similar, independent requests per minute. In the interest of programmer productivity and ease of scaling, workloads in data centers have shifted from single monolithic processes toward a micro and nanoservice software architecture. As a result, single servers are now packed with many threads executing the same, relatively small task on different data.

State-of-the-art data centers run these microservices on multi-core CPUs. However, the flexibility offered by traditional CPUs comes at an energy-efficiency cost. The Multiple Instruction Multiple Data execution model misses opportunities to aggregate the similarity in contemporary microservices. We observe that the Single Instruction Multiple Thread execution model, employed by GPUs, provides better thread scaling and has the potential to reduce frontend and memory system energy consumption. However, contemporary GPUs are ill-suited for the latency-sensitive microservice space.

To exploit the similarity in contemporary microservices, while maintaining acceptable latency, we propose the Request Processing Unit (RPU). The RPU combines elements of out-of-order CPUs with lockstep thread aggregation mechanisms found in GPUs to execute microservices in a Single Instruction Multiple Request (SIMR) fashion. To complement the RPU, we also propose a SIMR-aware software stack that uses novel mechanisms to batch requests based on their predicted control flow, split batches based on predicted latency divergence, and map per-request memory allocations to maximize coalescing opportunities. Our resulting RPU system processes 5.7× more requests/joule than multi-core CPUs while increasing single thread latency by only 1.44×

Biography

Mahmoud Khairy is a researcher member at AMD Research. Mahmoud earned his Ph.D. from the Department of Computer Engineering at Purdue University. He was advised by Professor Tim Rogers. Mahmoud’s research interests include computer architecture, compilers, and systems, focusing on programmable accelerators, especially SIMT-based accelerators, like GPUs and RPUs. His Ph.D. thesis was about building scalable and energy-efficient SIMT systems for deep Learning and data center microservices. His research work has appeared at top-tier conferences like MICRO and ISCA. He received his bachelor’s and master’s degrees in Computer Engineering from Cairo University, Egypt.

Video of the Talk