Xiaoqi Chen (Danny): Purdue ECE

I am an Assistant Professor at the School of Electrical and Computer Engineering, Purdue University. My research interest is at the juxtaposition of approximate algorithm design and optimizing real-world high-speed networks. I design and build efficient data-plane algorithms for network measurement and closed-loop control to improve network performance and security.

Prior to joining Purdue in Fall 2024, I was a Postdoctorate Researcher at VMware Research Group. I received my PhD degree from Department of CS, Princeton University in 2023, and my Bachelor's degree from IIIS, Tsinghua University in 2017.

I'm looking for motivated and adventurous students -- please drop me an email if you're interested in working with me.

News:

[Aug 2024] Starting at Purdue! Boiler Up!
[Oct 2023] I just defended my PhD degree :) I'm joining VMware research as a postdoctoral researcher.
[Oct 2023] SmartCookie was accepted to USENIX Security 2024!
[Jul 2023] FLM paper was accepted to NSDI 2024!
[May 2023] 🏆 AHAB paper won the best paper award at INFOCOM 2023!
[Dec 2022] eBPF Sketching paper was accepted to the January 2023 issue of SIGCOMM CCR!
[Oct 2022] 🏆 DFA Synthesis won the best paper award at SOSR 2022!

Research Projects

Network Measurement in Data Plane

I design algorithm and data structures that run directly in the data plane of high-speed programmable switches, to perform measurement telemetry and help operators analyze performance and security anomalies. Since high-speed switches impose tight memory and arithmetic constraints, I employ various approximation techniques and hardware-specific adaptations to produce algorithms that can process traffic at Terabits per second line rate.

BeauCoup: run multiple queries simultaneously in data plane

We identify limited memory access, instead of limited memory space, as the most critical resource constraint for running multiple network measurement algorithm on programmable switches.
By using Coupon Collectors, BeauCoup supports running multiple distinct counting queries simultaneously, while only making a small constant number of memory accesses per packet.
Furthermore, unlike earlier work, BeauCoup supports updating traffic queries on-the-fly without re-compiling P4 code, by separating query parameters from data-plane logic. Queries can be updated via installing TCAM rules only, minimizing network downtime.

Featured in APNIC Blog!

Paper (SIGCOMM'20) Slides P4 Code

PRECISION: Heavy-hitter detection in programmable data plane

To run at Terabits-per-second line rate in programmable data plane, measurement algorithms must follow a restrictive programming model, with constraints on memory size and access.
We modified the Space-Saving algorithm to adapt to these hardware constraints, to track Heavy Hitters (network flows with the largest volume) in the data plane. By recirculating a small fraction of packets probabilistically, we greatly simplify our algorithm's memory access pattern. Our algorithm still achieves comparable accuracy to algorithms in an unrestricted programming model.
In our paper, we also summarized the programming model constraints into three easy-to-follow rules for designing future measurement algorithms.

Paper (ToN) Slides P4 Code

AROMA: Routing-oblivious packet and flow sampling

Traditional traffic sampling and analysis methods require precise routing information to de-duplicate measurements collected from multiple vantage points. However, network administrators may not have complete or accurate knowledge of how every packet is routed.
By using random hash functions, we build a routing-agnostic network measurement framework that sample flows and packets from multiple vantage points across the network. The samples can subsequently be used to perform various network-wide analysis.

Paper (IFIP Networking'20)

Measuring RTT in data plane using multi-stage hash tables

We measure packet-level round trip time by matching a TCP packet with its corresponding acknowledgement. We buuild multi-stage hash tables to reduce the effect of hash collisions when using hash tables in the data plane, and allows table entries to "lazily expire" by deleting them upon hash collisions.
Compared with prior work that only looks at handshake packets, our work produces multiple RTT samples per flow, which helps capturing changing network condition for long-running TCP flows such as video streaming.

Paper (SPIN'20 workshop) Slides P4 Code

Fridge: Unbiased measurement in the data plane

Hash collisions in data structures often affect the accuracy of network measurement in the data plane. For example, when we measure round-trip delays by saving request packets in a hash-indexed array and matching with subsequent response packets, those requests with higher delays are less likely to survive all the hash collisions until their responses arrive. This creates a bias in the measured delay distribution.
We propose Fridge, a novel data structure that tracks the time each request sample spent within it before being matched with the corresponding response. This allows us to precisely analize and correct for the survivorship bias caused by hash collisions. Using an entry probability and an insertion counter, the Fridge can significantly reduce the bias in delay distribution measurement, achieving the same accuracy with 2x-4x memory saving.

Paper (APoCS'22)

High-speed In-Kernel network measurement using eBPF

eBPF is an infrastructure that allows to dynamically load and run micro-programs directly in the Linux kernel's network data plane, without the need for recompiling the kernel. This provides a great opportunity to run sketch-based high-speed network measurement algorithms on servers.
However, verbatimly running existing sketch-based measurement algorithms under eBPF suffers from several performance bottlenecks. In this paper, we present a case study and discuss techniques to optimize sketch-based network measurement algorithms for best performance under eBPF.

To appear in Computer Communication Review.

Draft Code

Real-time, Closed-loop Control

With data-plane measurement algorithms, programmable switches can now react to changes in traffic in real time, without the higher latency in conventional SDN control plane. The switches can now react to events in a sub-millisecond time scale to improve network performance.

ConQuest: real-time microburst analysis and mitigation

Microburst is a phenomenon where bursty ingress traffic fills up a switch's queuing buffer rapidly, causing high latency, jitter, and in the worse case, packet drops. Due to its short timescale, conventional monitoring methods based on packet sampling or aggregated statistics provide very little insights about the dynamics or the cause of microbursts.
ConQuest analyzes each individual network flow's contribution to queuing in real time, using a round-robin snapshots data strucutre running in the programmable data plane. After identifying the very bursty flows that caused microbursts, ConQuest can immediately suppress these bursty flows to surgically mitigate microbursts, protecting other flows.
We also propose a novel tapping-based setup to analyze queuing in legacy, non-programmable switches. We have used ConQuest to perform fine-grained queuing analysis in two real-world deployments: at Princeton University's campus network, as well as in AT&T's carrier network.

Featured in APNIC Blog and P4.org Blog!

Paper (CoNEXT'19) Slides P4 Code

AHAB: Scalable bandwidth fairness under network slicing

Maintaining bandwidth fairness across millions of users is challenging: the switch hardware only supports dozens of queues, while software switching running on CPUs incurs extra latency. Each user’s fair share changes constantly as users arrive and leave; slicing poses further challenges as surplus bandwidth from underutilized slice need to be reallocated to users in overutilized slices.
We tackle these challenges and implement approximate slice-level bandwidth fairness, by guessing the fair rates and continually refine the guess using a real-time control loop. To make more accurate guesses, we implement approximated linear interpolation computation entire within the data plane. This allows each update to finish under a millisecond and our system converges to fair rate within 3 milliseconds, 20 times faster than prior state-of-the-art.

🏆 Best Paper Award at INFOCOM'23!

Paper (ToN) Slides P4 Code

Synthesis and Verification

Synthesize state machines for data plane execution

Many stateful network processing logic can be formulated as finite state machines; executing them in data plane allows switches to take different actions according to different states. However, programmable switches support only a limited set of arithmetic operations between reading and writing values stored in stateful memory, making it challenging to implement complex state transitions as atomic memory updates.
We propose a novel technique to map states to numerical values and implement state transitions as simple arithmetic operations. We then formulate the requirement as SMT constraints and use an off-the-shelf SMT solver to produce such mappings. Evaluation shows our approach can successfully synthesize complex example state machines within a few minutes.

🏆 Best Paper Award at SOSR'22!

Paper (SOSR'22) Code

Match packet sequence patterns at line rate

Many network monitoring tasks can be described as finding sequences of packets that matches a certain pattern. FLM allows network operators to express these patterns as extended regular expressions, and compiles these patterns into state machines for line-rate execution.

Paper draft

Security and Privacy

Programmable switches can play a more vital role in network security and privacy, for example via actively participating in encryption and authentication protocols. As a starting point, I implemented important cryptography building blocks for high-speed programmable switches.

Implementing AES on programmable switches

We implement the standardized and widely-used AES encryption on the Barefoot Tofino high-throughput programmable switch, using the Scrambled Lookup Tables optimization.
We observe that traditional AES implementations are often optimized for embedded devices that lacks memory space, yet switches have moderate memory size. By building many lookup tables, we trade memory space overhead for shorter execution pipeline length.

Paper (SPIN'20 workshop) Slides P4 Code

Implementing HalfSipHash on programmable switches

Hash function is a vital component of many data plane algorithms and is often used for indexing, fingerprinting, and sampling. The usage of readily available CRC32 as a hash function opens up a major attack surface when processing adversarial traffic. We implement a secure keyed hash function, HalfSiphash, on the Barefoot Tofino high-throughput programmable switch.

Paper (SPIN'21 workshop) P4 Code

SmartCookie: split-design SYN-flood defense

Traditional server-based SYN-flooding defense uses significant CPU for calculating SYN cookies. Earlier efforts to offload SYN cookie require two connection setups and per-flow memory on the offload device to translate sequence numbers.
By using a split design, SmartCookie utilized high-speed programmable switch to generate SYN cookies securely and efficiently, while an eBPF-based server agent transparently manages all connection setup, eliminating the need for per-flow memory on the switch.

Paper draft Code

Real-world testbed

P4Campus: Campus Network as a P4 testbed

Many researchers find it challenging to deploy their research in real-world networks. Meanwhile, our university campus network faces the same operational challenges as other larger networks, making it an ideal local testbed for networking research projects. We collaborate with Princeton University's campus network operator to deploy various P4-based network applications in the campus network to improve the network's performance, privacy, and security, and at the same time validating novel research ideas using production-grade prototypes. By sharing our experiences, we hope to enable others to replicate a similar setup in their own campus networks.

Website Slides Paper (SIGCOMM CCR)

Selected Publication

[INFOCOM'23] Scalable Real-Time Bandwidth Fairness in Switches. Robert MacDavid, Xiaoqi Chen, Jennifer Rexford
[SOSR'22] Synthesizing State Machines for Data Planes. Xiaoqi Chen, Andrew Johnson, Mengying Pan, David Walker
[SIGCOMM'20] BeauCoup: Answering Many Network Traffic Queries, One Memory Update at a Time. Xiaoqi Chen, Shir Landau-Feibish, Mark Braverman, Jennifer Rexford
[CoNEXT'19] Fine-Grained Queue Measurement in the Data Plane. Xiaoqi Chen, Shir Landau Feibish, Yaron Koral, Jennifer Rexford, Ori Rottenstreich, Steven A Monetti, Tzuu-Yi Wang

See my full publication list at Google Scholar.

Community Service

Journal Reviewer:

IEEE/ACM Transactions on Networking (ToN)
IEEE Transactions on Network Science and Engineering (TNSE)
IEEE Transactions on Information Forensics and Security (T-IFS)
Journal of Network and Computer Applications (JNCA)
Computer Networks

Program Committee:

NSDI, 2024
ICNP, 2024

Shadow Program Committee:

EuroSys, 2022
IMC, 2022

Artifact Evaluation Committee:

ACM SIGCOMM, 2021, 2022, 2023
ACM CoNEXT, 2021, 2023 (chair)
USENIX Security, 2024

Honors & Awards

Best Paper Award, INFOCOM 2023: Scalable Real-Time Bandwidth Fairness in Switches
Best Paper Award, SOSR 2022: Synthesizing State Machines for Data Planes
Siebel Scholars, class of 2022
Finalist, Facebook PhD Fellowship, 2020
School of Engineering and Applied Science Award for Excellence, Princeton University, 2020

Miscellaneous

I love playing tennis and soccer, and I played in the IIIS varsity soccer team at Tsinghua's interdepartmental league.
I enjoy lots of other sports as well -- badminton, table tennis, basketball, swimming and climbing, you name it.
Did you know []*[]=0 and []+[]=""? I write javascript for fun! I also used it for web development in many personal projects, including this webpage.
I mostly use Go and Python in my research project. You can actually write Python and Just-in-Time compile to C or CUDA!

Xiaoqi Chen