Real Time Network Policy Checking Using Header Space Analysis

Peyman Kazemian; Michael Chang; Hongyi Zeng; George Varghese; Nick McKeown; Scott Whyte

Authors:

Peyman Kazemian, Michael Chang, and Hongyi Zeng, Stanford University; George Varghese, University of California, San Diego and Microsoft Research; Nick McKeown, Stanford University; Scott Whyte, Google Inc.

Abstract:

Network state may change rapidly in response to customer demands, load conditions or conﬁguration changes. But the network must also ensure correctness conditions such as isolating tenants from each other and from critical services. Existing policy checkers cannot verify compliance in real time because of the need to collect “state” from the entire network and the time it takes to analyze this state. SDNs provide an opportunity in thisrespect as they provide a logically centralized view from which every proposed change can be checked for compliance with policy. But there remains the need for a fast compliance checker.

Our paper introduces a real time policy checking tool called NetPlumber based on Header Space Analysis (HSA). Unlike HSA, however, NetPlumber incrementally checks for compliance of state changes, using a novel set of conceptual tools that maintain a dependency graph between rules. While NetPlumber is a natural ﬁt for SDNs, its abstract intermediate form is conceptually applicable to conventional networks as well. We have tested NetPlumber on Google’s SDN, the Stanford backbone and Internet 2. With NetPlumber, checking the compliance of a typical rule update against a single policy on these networks takes 50-500s on average.

Peyman Kazemian, Stanford University

Michael Chang, Stanford University

Hongyi Zeng, Stanford University

George Varghese, University of California, San Diego and Microsoft Research

Nick McKeown, Stanford University

Scott Whyte, Google Inc.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {180298,
author = {Peyman Kazemian and Michael Chang and Hongyi Zeng and George Varghese and Nick McKeown and Scott Whyte},
title = {Real Time Network Policy Checking Using Header Space Analysis},
booktitle = {10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13)},
year = {2013},
isbn = {978-1-931971-00-3},
address = {Lombard, IL},
pages = {99--111},
url = {https://www.usenix.org/conference/nsdi13/technical-sessions/presentation/kazemian},
publisher = {USENIX Association},
month = apr
}

Download

Kazemian PDF

View the slides

Presentation Video

Presentation Audio

Download Audio

Public Summary:

by Brad Karp

One of the most compelling arguments for software-defined networking is the woefully complex state of the Internet's control plane. Consider the sheer number of control-plane protocols for routing and the complexity of their interactions and configuration. To do routing alone, a network operator faces multiple IGPs, MPLS, iBGP (with route reflectors or confederations), and eBGP, to say nothing of multicast or interactions with traffic engineering. As evidence of the difficulties of sensibly configuring this menagerie of protocols, consider the variety of i- and eBGP pathologies that have come to light, and the role of (mis-)configuration and badly interacting policy settings in causing them.

Since the dawn of the ARPAnet, the desire for robust resilience to failures led the research community to value aggressively distributed designs for the Internet's control plane. The vision was that to heal around failures, the network could not centralize control functionality at a node that could itself potentially become a single point of failure. This principled stand has allowed the Internet toscale remarkably well, in that end-to-end reachability often *does* (eventually!) survive despite localized failures within the network's vast collection of links and devices. But this penchant for aggressively distributed control has begotten the aforementioned complexity. We decentralize in the hope of achieving resilience, but at what point does the complexity of decentralized control *threaten*resilience? Answering even simple questions about the behavior of a network of any scale (e.g., "can this host reach this other one?" or "does forwarding for any destination loop anywhere in the network?") at best requires reasoning manually about the configurations and interactions of many control-plane protocols.

Against this rather bleak status quo for verifying whether a large-scale IP network complies with important correctness and security policies, a subset of this paper's authors (Kazemian et al., NSDI 2012) previously took an important first step toward formally verifying the compliance of a whole network with a policy. In that prior work on Header Space Analysis (HSA), they developed techniques for what one might reasonably call static analysis of a network's configuration: they developed a formal representation for a device's packet processing behavior, in which a device transforms a space of possible input packet headers into a space of possible output packet headers, and an approach for composing such per-device transformations in header space to represent an entire network's behavior. HSA allowed them to verify whether the header-space model of an entire network's configuration--derived from the packet-processing rules in effect at each device in the network--complies with a user-specified policy. Their implementation of HSA, Hassel, could verify a variety of practically useful policies on a campus-scale network in seconds to minutes.

Now, in this paper, Kazemian et al. take the next important step: they show how to *dynamically* verify and/or enforce policy invariants for the behavior of a large-scale network of switches, routers, andfirewalls, even as the forwarding state within these devices changes over time, whether because of configuration changes by operators or in-network failures. Whereas Hassel verified an entire network atonce, the authors' new system, NetPlumber, verifies a network *incrementally*. Each time an individual device's packet processing behavior changes (e.g., when a central controller updates a switch'srule in an OpenFlow network, or when a routing protocol changes a FIB entry in an IP router), NetPlumber only need verify that change (and its knock-on effects on other rules).

The central idea that makes incremental verification by NetPlumber possible is the authors' representation of devices' forwarding rules and their relationships as a graph, where rules are nodes and directed edges represent next-hop dependencies between rules. Informally, if the output traffic transformed by a first rule on one device arrives over a link at a second device, and a second rule on that second device matches the first rule's output traffic, then the first rule exhibits a next-hop dependency on the second. Tracing such dependencies is the essence of how NetPlumber efficiently reverifies anetwork-wide invariant upon a change to a single rule. And how efficient NetPlumber is, by and large: on fairly large-scale networks, the system can verify whether most rule changes comply with aninvariant in less than a millisecond. The authors demonstrate this impressive level of incremental verification performance on several large-scale networks, including Google's SDN-based WAN, the Internet2IP backbone, and the Stanford University IP network. Moreover, toward easing the expression of policies by network administrators, NetPlumber supports specifying policies in Prolog, using constructsfamiliar to SDN managers from the Flow-based Management Language (FML).

The program committee was impressed with the speed at which NetPlumber verifies most rule changes; sub-millisecond verification does begin to make real-time verification of all changes to devices' forwarding state seem within the realm of practicality. There are two events that are not as fast to verify, however: verifying after link-up or link-down events takes on the order of seconds, because of the sheer number of rules affected via next-hop dependencies in such cases. While link failure and recovery events are generally relatively rare, aberrant behavior, such as links' "bouncing" up and down, as when a switch port has a poor electrical connection, might stress NetPlumber computationally. The program committee found NetPlumber a very natural fit with SDNs, where the system can easily interpose on control messages between the central controller and network devices, and verify that a control message yields a network-wide configuration that complies with a policy.

The authors suggest that NetPlumber could find use in IP networks as well, and I'm inclined to agree that the system holds promise in that context, too. The question of how IP network devices would exposeincremental changes to their internal state to NetPlumber remains; whether this communication occurs by "pushed" SNMP traps or polling by NetPlumber, the cost on devices of propagating this information toNetPlumber could be significant. (Consider, for example, a busy core router in a tier-1 ISP, processing many iBGP updates per second after the failure of an external link.) This state-gathering overheadquestion aside, the *verification* approach taken in NetPlumber is clearly equally applicable to IP networks. In this sense, NetPlumber deserves the community's consideration as a fundamental technique for network policy verification, not only relevant to SDNs. Finally, those interested in network verification might also enjoy reading VeriFlow (Khurshid et al.) in this year's NSDI, which offers a different design for high-speed incremental policy verification for networks.

connect with us