Synthesizing network designs with certifiable performance properties
With the wide-spread adoption of online and cloud-based services, it is critical that the underlying network infrastructure
meet increasingly stringent performance requirements (e.g., sustain throughput for business-critical applications). These
requirements must be met despite failures that are the norm given the global scale of ISP and cloud provider networks, and
their rapid pace of evolution. Many existing approaches to designing networks for failures
(i) only focus on availability, resulting in poor performance on failures; (ii) only consider a small
number of failure states, and do not scale as the number of possible failure states increase;
or (iii) rely on ad-hoc simulation-based testing. While tools for software and hardware synthesis and testing is
a thriving 10 billion dollar industry, the state of practice in certifying and synthesizing network designs is only in
its nascent stages.
The project is tackling these challenges by developing novel techniques that enable architects to plan
their network designs (e.g., design topology, provision capacity, algorithms for re-routing traffic on failures)
so performance is acceptable over a large set of scenarios the network may encounter. The project is distinguished
by a focus on performance (not just availability), designing for multiple concurrent failures, and designs with
provable performance guarantees for a given set of failure scenarios. This project may be viewed as an early effort
aimed at formally verifying quantitative network properties, and synthesizing networks for such performance requirements.
This is in contrast to much
recent progress in the field of network verification, which has primarily focused on correctness
(e.g., ensuring security policies are correctly instantiated).
The project is bringing together expertise in network systems, and optimization theory,
and is advancing the state-of-the-art in two key ways. First, the project is developing
(i) new mechanisms that may be deployed in the network to respond to failures; and
(ii) frameworks that can certify the resulting performance is acceptable over desired failure
states. A novelty of the framework is the ability to model rich and flexible network mechanisms.
Second, unlike existing methods that only consider worst-case performance, and may be overly
conservative, the project is developing novel ways to design for requirements that must be met by a desired
percentage of scenarios.
The project has the potential for significant real-world impact by ensuring networks comply with Service Level Objectives in the face of failures, and is leading to networks with lower cost, better performance, and higher reliability.
The project is extensively engaging with industry and network operator forums, with results being validated
using real data from these networks.
News:
We have released
code
for PCF! Learn more about PCF here.
Publications:
- PCF: Provably Resilient Flexible Routing., Chuan Jiang, Sanjay Rao, Mohit Tawarmalani, ACM SIGCOMM 2020. [PDF] [Slides] [Video] [Code]
- Lancet: Better network resilience by designing for pruned failure sets., Yiyang Chang, Chuan Jiang, Ashish Chandra, Sanjay Rao, Mohit Tawarmalani, ACM SIGMETRICS 2020. [PDF] [Slides] [Video]
- Robust validation of network designs under uncertain demands and failures.,
Yiyang Chang, Sanjay Rao, Mohit Tawarmalani, USENIX NSDI 2017.
[PDF]
[Slides]
Team:
|