Yiying Zhang

Distributed and Disaggregated Resources

Datacenters have been using a monolithic server model for decades, where each server has a motherboard that hosts all types of hardware resources, usually including a processor, memory chips, storage devices, and network cards. This monolithic architecture is easy to deploy but is inflexible in terms of resource utilization, new hardware device integration, and failure handling. We are looking into new ways to rethink datacenter hardware and software systems, including disaggregated hardware architecture, disaggregated operating system, remote memory (and non-volatile memory) systems, and distributed (non-volatile) memory systems.

Disaggregated Operating System

Resource disaggregation is a hardware architecture that breaks monolithic servers into hardware resources that are connected with a fast, scalable network. We envision a fully-disaggregated datacenter (or rack) to be one that consists of independent, failure-isolated, network-attached components.

OSes built for monolithic computers can not handle the distributed nature of disaggregated hardware components. Datacenter distributed systems are built for managing clusters of monolithic computers, not individual hardware components. When traditional OS operations spread across hardware components over the network, these distributed systems fall short. Clearly, we need a new operating system for the disaggregated datacenter architecture.

We propose the concept of decomposed operating system for the disaggregated datacenter architecture. The basic idea is simple: When hardware is disaggregated, the operating system should be also.

Kernel-Level Indirection Layer for RDMA

Recently, there is an increasing interest in building datacenter applications with RDMA because of its low-latency, high-throughput, and low-CPU-utilization benefits. However, RDMAis not readily suitable for datacenter applications. It lacks a flexible, high-level abstraction; its performance does not scale; and it does not provide resource sharing or flexible protection. Because of these issues, it is difficult to build RDMA-based applications and to exploit RDMA’s performance benefits.

To solve these issues, we built LITE, a Local Indirection TiEr for RDMA in the Linux kernel that virtualizes native RDMA into a flexible, high-level, easy-to-use abstraction and allows applications to safely share resources. Despite the widely-held belief that kernel bypassing is essential to RDMA’s low-latency performance, we show that using a kernel-level indirection can achieve both flexibility and lowlatency, scalable performance at the same time.

Get LITE here.

Distributed Shared Persistent Memory

NVMs have the potential to greatly improve the performance and reliability of large-scale applications in datacenters. However, it is still unclear how to best utilize them in distributed, datacenter environments.

We introduce Distributed Shared Persistent Memory (DSPM), a new framework for using persistent memories in distributed datacenter environments. DSPM provides a new abstraction that allows applications to both perform traditional memory load and store instructions and to name, share, and persist their data. We built Hotpot, a kernel-level DSPM system that provides low-latency, transparent memory accesses, data persistence, data reliability, and high availability.

Get Hotpot here.

Related Publications

Distributed Shared Persistent Memory
Yizhou Shan, Shin-Yeh Tsai, Yiying Zhang
to appear at the 9th Annual Non-Volatile Memories Workshop (NVMW '18)

LITE Kernel RDMA Support for Datacenter Applications
Shin-Yeh Tsai, Yiying Zhang
Proceedings of the 26th ACM Symposium on Operating Systems Principles (SOSP '17)

Distributed Shared Persistent Memory
Yizhou Shan, Shin-Yeh Tsai, Yiying Zhang
Proceedings of the ACM Symposium on Cloud Computing 2017 (SoCC '17)

Lego: A Distributed, Decomposed OS for Resource Disaggregation
Yizhou Shan, Yilun Chen, Yutong Huang, Sumukh Hallymysore, Yiying Zhang
Poster at the 26th ACM Symposium on Operating Systems Principles (SOSP '17)

Disaggregated Operating System
Yizhou Shan, Sumukh Hallymysore, Yutong Huang, Yilun Chen, Yiying Zhang
Poster at the ACM Symposium on Cloud Computing 2017 (SoCC '17)

Disaggregated Operating System
Yiying Zhang, Yizhou Shan, Sumukh Hallymysore
the 17th International Workshop on High Performance Transaction Systems (HPTS '17)

Rockies: A Network System for Future Data Center Racks
Shin-Yeh Tsai, Linzhe Li, Yiying Zhang
WIP and Poster at the 14th USENIX Conference on File and Storage Technologies (FAST '16)