Research

My research focuses on building smarter, more efficient networks by designing algorithms that run directly in programmable network devices. I bridge the theorical insights of streaming algorithms with the practical constraints of high-speed packet processing and the nuanced needs of upper layers, enabling new possibilities in application-networking co-design and real-time, closed-loop control.

Co-designing network and application layer algorithms for optimizing AI

AI has become one of the most demanding workloads for today's data center networks. Gradient compression during training indiscriminately shrinks payload size, yet it does not prevent queue buildup in switches; tail latency still leads to “stragglers”, significantly affecting synchronized training batches. This problem is especially prominent for training over a mixed-use cluster, sharing network infrastructure with possibly bursty background traffic.

My research enables network switches to perform just-in-time gradient compression by structuring gradient packets to be trimmable upon transient congestion. This approach avoids the need for sophisticated, computation-heavy floating point processing of on-switch gradient compression, and can already be supported by many of today's switch hardware. A clear division of labor where GPUs compute trimmable encodings and switches react swiftly to congestion can lead to an overall shorter time-to-accuracy.

Algorithm Design for Measurement and Closed-loop Control

I design data-plane algorithm and data structures tailored for the stringent computational and memory constraints of high-speed packet processing pipelines, to answer various network measurement and telemetry tasks, such as reporting heavy-hitter flows, super-spreaders, and round-trip time distributions.

I also identified and addressed a key tension between the switch hardware's restriction on memory accesses per packet and the operator's desire to run multiple measurement queries simultaneously. BeauCoup supports running multiple queries by using, on average, less than one memory access per packet for each query, a breakthrough for scaling data-plane network monitoring to support multiple diverse measurement tasks.

Measurement is the foundation for real-time, closed-loop control directly in the data plane. In ConQuest, in-network analysis of microbursts enables real-time reactive control, mitigating the impact of bursty traffic on other flows. This work has been deployed in campus and carrier backbone networks to analyze the burstiness of real-world traffic. Meanwhile, AHAB uses a novel sketch and approximate linear interpolation for scalable bandwidth fairness enforcement, converging to fair allocation within milliseconds through several rounds of interative refinement. This allows a single switch to support network slicing that scales to millions of devices across thousands of slices.

Privacy and Security

In-network computing offers new opportunities to enhance network security and privacy.

In SmartCookie, I proposed a split-proxy design to improve the defense against large-scale SYN flooding attacks. It offloads the computationally expensive calculation of cryptographic cookies to programmable data planes. A lightweight eBPF agent on the server side transparently handles connections to support unmodified TCP stacks, which eliminates the need for the switch to maintain per-flow state and overcomes the memory limitations of previous switch-based defenses.

To help the community adopt stronger security practices, I have implemented standardized and well-vetted cryptographic algorithms, such as AES and SipHash, for programmable switches. These building blocks facilitate the move away from weak hash functions and insecure, hand-rolled cryptography in data plane applications. Besides, my research in network measurement is broadly applicable to detecting security threats such as worm propagation and BGP hijacking.