LISA '03 Paper   
[LISA '03 Technical Program]
Using Service Grammar to Diagnose BGP Configuration Errors
Xiaohu Qie - Princeton University
Sanjai Narain - Telcordia Technologies
Abstract
Often network components work correctly, yet end-to-end services
don't. This can happen when configuration parameters of components are
set to incorrect values. Configuration is a fundamental operation for
logically integrating components to set up end-to-end services.
Configuration errors frequently arise because transforming end-to-
end service requirements into component configurations is inherently
difficult. Such transformations are largely performed in a manual and
localized fashion, resulting in high cost of network operations.
The Service Grammar technique has been developed to solve
the configuration error diagnosis problem and, more generally, to
formalize the process of building complex systems via configuration.
At its core is a Requirements Language that contains
global, high-level constraints upon configuration parameters. These
are derived from identifying the notion of "correct configuration"
associated with different protocols. These notions are composed to
create system-wide requirements on architecture and policies. A
Diagnosis Engine checks if constraints in the Requirements
Language are true given definite component configurations and
recursively checks composite requirements.
This paper describes an application of Service Grammar to
diagnosing BGP configuration errors. As BGP architecture and policies
differ widely from one network to another, it is not possible using
previous techniques to check if router configurations implement the
intended requirements. Our tools enable administrators to
specify system-wide, network-specific requirements and check if they
are correctly implemented by component configurations.
Introduction
Traditional network management systems diagnose hard, localized
errors such as fiber cuts or hardware/software component failures. It
is quite possible, however, that network components work correctly yet
end-to-end services don't. This happens if there are configuration
errors, i.e., configuration parameters of components are set to
incorrect values. Configuration is a fundamental operation for
integrating components to implement end-to-end services. Configuration
errors arise frequently because transforming end-to-end service
requirements into configurations is inherently difficult: in realistic
networks there are many components, configuration parameters, values,
protocols and requirements. Yet, such transformation is largely
performed manually. The resulting high cost of network operations as
well as the potential for security breaches is well documented [1, 2].
The Service Grammar [3, 4, 5, 6] technique has been
developed to solve the configuration error diagnosis problem, and more
generally, to formalize the process of building complex systems via
configuration. At its core is a Requirements Language that
contains global, high-level abstractions that are set up in the
process of setting up end-to-end services. A good heuristic for
deriving this language is to ask the question "what does it mean for
a group of agents executing a protocol to be correctly configured."
This language is created for all of the protocols in a domain of
interest. End-to-end service requirements can be naturally defined as
logical conjunctions of requirements in the language at and across
different protocol layers.
This is done by rules of the form
A:-B1, ..., Bk,
k0,
where each A and Bi is a requirement. A
Diagnosis Engine checks if a language requirement is true given
definite system configuration. By recursive use of this operation,
complex algorithms for diagnosis can be developed. Service Grammar
captures the intuition to regard a system not as a set of components
but as a set of services that, in general, span multiple components.
The information flow of the diagnosis system is illustrated in
Figure 1. The diagnosis engine takes input from two sources: (1)
service requirements expressed in the requirements language, and (2)
vendor-neutral component configurations stored at a centralized
database, e.g., an LDAP directory. Raw component configuration is
parsed into vendor-independent data structures by vendor-specific
adaptors. The diagnosis engine queries the component configuration
database and verifies if the configurations are consistent with
service requirements. If not, it notifies the administrator where the
diagnosis process had failed. The administrator can then modify the
configuration settings and rerun the diagnosis process.
Figure 1:
Diagnosis system.
Service Grammars have been built and used for adaptive Virtual
Private Networks and mobile security [3, 4, 5, 6]. This paper
describes an application of Service Grammar to diagnosing
configuration errors in BGP [7]. Previous solutions to diagnosing
configuration errors have been network invariant [8] in that they
contain a fixed set of constraints that must be satisfied by every BGP
network. However, BGP requirements such as logical architecture and
policies differ widely from one network to another. It is not possible
in previous techniques to check if component configurations implement
the intended requirements. Our tools enable administrators to
specify network-specific requirements and check if they are correctly
implemented by component configurations.
BGP Background
BGP is the Internet's inter-domain routing protocol run between
autonomous systems (ASes). Routers in different ASes use BGP to
exchange information on how to reach destinations throughout the
Internet. BGP is path-vector based. A BGP route consists of a network
prefix N and an AS_PATH of the form
{ASk, ..., AS0},
which is the ordered list of ASes to traverse to reach N.
The AS path is constructed by successively propagating reachability
information: each AS prepending its own AS number to the path (one or
more times) before sending it to neighbors. Figure 2 illustrates how
routing information about network 200.12.0.0/16 is propagated
between ASes. For instance, AS160 knows its traffic will traverse
AS172, AS180 and AS200 before reaching the destination.
BGP is capable of enforcing policies based on various preferences
and constraints. BGP policies affect the route selection and export
process, thereby controlling how traffic enters and leaves an AS. Each
AS can define BGP policies according to its own criteria. BGP chooses
the best route based on a number of metrics, such as the AS path
length. In Figure 2 AS172 chooses {180, 200} over {190, 200, 200} as
the best route to N because its policy favors a shorter AS path.
Figure 2:
BGP network example.
Policy can be also applied to the route propagation process. An AS
decides what to tell its neighbors. If an AS is unwilling to carry
certain traffic for a neighbor, its policy will disallow routing
advertisements about particular destinations being sent to the
neighbor. For instance, AS180 and AS190 chose not to export routes to
N to each other. As a result, the horizontal link between the two ASes
will not be used to carry traffic to N. In a less restrictive case,
AS200 tells AS190 about N, but prepends its AS number twice to make
the path longer, indicating the route is considered a less attractive
one. The policy eventually affects AS172's route selection process: it
chose AS180 instead of AS190 as the next hop AS to reach N.
Challenges of Setting Up BGP
To set up BGP, network administrators configure individual routers
in the AS using a configuration language. The following is a sample
configuration in Cisco CLI format [9] for a router in AS160. The
configuration involves originating routes, establishing peer
relationship with neighbors, and applying policy filters. In this
example, the router announces network 172.1.1.0/24 and peers
with a remote BGP router in AS172. The policy filter allows only
routes with an empty AS path (i.e., locally originated routers) to be
advertised to AS172:
router bgp 160
172.1.1.0 mask 2525.255.255.0
network 172.16.24.1 remote-as 172
172.16.24.1 filter-list 1 out
!
ip as-path access-list 1 permit ^$
!
As BGP is a complex protocol, manually configuring individual
routers is a time-consuming and error prone task. This is especially
challenging in a large network with hundreds to thousands of routers.
To maintain
BGP Requirements Language
| Requirement | Meaning
| ibgp_session (RouterA, RouterB, LocalAS) | Both
RouterA and RouterB are BGP speakers of the local AS. A BGP session
can be successfully established between them.
| ebgp_session (LocalRouter, LocalAS, RemoteRouter,
RemoteAS) | A BGP session can be successfully established between
the local BGP speaker and the remote BGP speaker.
| reflector_client_session (ReflectorRouter, ClientRouter,
LocalAS) | In addition to ibgp_session requirements, the
reflector is set up to forward routing updates from other IBGP peers
to the client.
cluster (Reflectors, Clients, LocalAS) | Reflectors
and clients form a cluster, i.e., reflectors are fully meshed and all
clients are peered with all reflectors.
| as_full_mesh (Clusters, Non-clients, LocalAS) |
Clusters and non-clients form an AS, i.e., all reflectors and non-
clients are fully-meshed. No client peers with a non-client.
| route_originate (Subnets, LocalAS) | The LocalAS
originates routes represented by subnets.
| link_to_provider (LocalRouter, RemoteRouter) | The
session represents a link to the local AS's provider. On this session,
the local AS should accept everything, but only announce its own
routes.
| link_to_customer (LocalRouter, RemoteRouter) | The
session represents a link to the local AS's customer. On this session,
the local AS should announce everything, but only accept the
customer's routes.
| link_to_peer (LocalRouter, RemoteRouter) | The
session represents a link to the local AS's peer. On this session, the
local AS should only announce its customers' routes, and only accept
the peer's customers' routes.
| provider_as (LocalAS, RemoteAS) | RemoteAS is a
provider of LocalAS.
| customer_as (LocalAS, RemoteAS) | RemoteAS is a
customer of LocalAS.
| peer_as (LocalAS, RemoteAS) | RemoteAS is a peer of
LocalAS.
| preferred_outgoing_link (LocalRouter, RemoteRouter,
RemoteDestination) | The session is the preferred outgoing link to
reach a remote destination, expressed in either subsets or ASPath.
| preferred_incoming_link (LocalRouter, RemoteRouter,
LocalDestination) | The session is the preferred incoming link to
reach a local destination, expressed in either subsets or ASPath.
| preferred_neighbor_entry (LocalRouter, RemoteRouter,
LocalDestination) | The session is the preferred entry from the
neighbor AS to reach a local destination, expressed in either subsets
or ASPath.
| | Table 1: Service grammar for
BGP.
a consistent view of routing inside an AS, all BGP routers must be
correctly configured to form a full-mesh or some well-structured
internal hierarchy, such as route reflector clusters. At a lower-
level, two BGP speakers must be able to talk to each other in order to
exchange routing information. This seemingly obvious requirement has
certain intricacies due to the fact that BGP relies on pre-existing
connectivity provided by Interior Gateway Protocols (IGP) or static
routes. For example, the remote peer address specified by a BGP router
must match the outgoing interface of the IGP route used by the remote
peer. Otherwise the connection will not be established, unless the
remote peer explicitly specifies the matching interface. This type of
implicit requirement can be easily overlooked by administrators, or
violated due to change of the network.
Policy routing is an important functionality of BGP, but also
provides numerous opportunities for configuration errors. In face of
this type of errors, BGP may continue to operate, but does not enforce
the intended policy. Policy violation could lead to connectivity,
security and economic problems. A well-know problem is address space
hijacking, in which one AS accidentally announces networks "owned"
by other ASes, forming a "blackhole" within the Internet. Other
policy problems are commonly related to the commercial relationships
an AS participates in. A multi-homed AS, for instance, shouldn't
provide transit service to non-local traffic. The causes of errors are
diverse, ranging from typos to poor understanding of configuration
semantics. An excellent empirical study of BGP policy configuration
errors is presented in Reference [10].
Service Grammar for BGP
Correct BGP configuration means all routers in an AS achieve the
joint goal of exchange routing information, maintaining a consistent
view of routing and enforcing intended policies. However, there is
quite a large conceptual gap between this global requirement and
individual router configurations. Configuration errors arise because
manual compilation of these high-level requirements into low-level
"machine language" is difficult. If for some reason the global
requirements are not satisfied there are no systematic tools to
automatically diagnose configuration errors. Network administrators
today manually perform these tasks.
Diagnosing why routers don't work together requires global
reasoning about the logical structure of the network as well as
dependencies between services. The BGP Service Grammar captures these
global abstractions that are set up in the process of constructing
routing services. By making these definitions explicit, network
administrators can formally state high-level network-specific
requirements and policies using these definitions.
A subset of the requirement language is shown in Table 1 followed
by detailed explanations. The requirements fall into two categories:
connectivity and policy.
Connectivity Requirements
The language provides two basic primitives - ibgp_session
and ebgp_session - for describing BGP neighbor relationships.
We outline the diagnosis procedure for ibgp_session in Figure
3.
Regarding establishing BGP neighbor relationship, the types of
configuration errors that can arise include:
-
Incorrect AS number or neighbor address at two session end points,
peer values are not mirror images of each other (usually typos).
-
The neighbor's address is not reachable via IGP. This can happen when
a loop-back interface is used but that interface does not participate
in any IGP.
-
A router tries to connect to a reachable interface of a remote
neighbor, but the neighbor uses a different outgoing interface in the
reverse IGP route. This happens when the neighbor has multiple
reachable interfaces.
Any of the above errors can lead to connectivity problems
preventing the BGP session from being established. This example
demonstrates that even a very basic BGP requirement implies a number
of assumptions and global relationships that the administrator must
keep in mind and configure correctly on every router. There are many
places for errors. The diagnosis engine systematically validates these
assumptions and global relationships, catching all potential errors
and providing useful information for the debugging process.
ibgp_session(RouterA, RouterB, LocalAS)
-
Meaning: Both RouterA and RouterB are BGP speakers of the local AS. A
BGP session can be successfully established between them so they can
exchange routing information.
-
Diagnosis procedure:
-
RouterA.as_num == LocalAS
RouterB.as_num == LocalAS
-
Ia, Ib, Na, Nb, s.t
Ia RouterA.interfaces
Ib RouterB.interfaces
Na RouterA.neighbors
Nb RouterB.neighbors
Ia.ip_addr == Nb.peer_address
Ib.ip_addr == Na.peer_address
Na.remote_as == LocalAS == Nb.remote_as
RouterA has an IGP route Ir_a to reach Ib
RouterB has an IGP route Ir_b to reach Ia
If Ia is a loopback interface then Na.update_source == Ia, else Ia is
the outgoing interface of Ir_a
If Ib is a loopback interface then Nb.update_source == Ib, else Ib is
the outgoing interface of Ir_b
Figure 3: IBGP session diagnosis
procedure.
cluster(Reflectors, Clients, LocalAS)
-
Meaning: Reflectors and clients form a cluster, i.e., reflectors are
fully meshed and all clients are peered with all reflectors.
-
Diagnosis procedure:
-
A Reflectors, B Clients,
reflector_client_session(A, B, LocalAS) is TRUE
-
X, Y (X !=Y) Reflectors ibgp_session(X, Y,
LocalAS) is TRUE
-
All reflectors have the same cluster_id
-
C Clients
N C.neighbors
If (N.remote_as == LocalAS) && (N Reflectors Clients)
return FALSE;
/* Clients shouldn't have IBGP sessions to non-clients */
Figure 4: Cluster diagnosis
procedure.
Notice these basic primitives are already higher-level than raw
router configurations. They can be used to compose other higher-level
requirements that specify an AS' logical structure, such as
reflector_client_session and cluster.
Figure 4 illustrates how to validate if a group of routers form a
cluster. The algorithm verifies three global properties: all
reflectors are fully meshed, all clients can receive updates from all
reflectors, and every client only has BGP sessions with routers in the
same cluster. IBGP session test is embedded as part of the procedure.
Policy Requirements
Routing policies are configured via policy filters. A filter
consists of a match criteria and a set of actions. Nearly all
attributes of a routing update can be used to specify the match
criteria, with AS path and network prefix being the most common ones.
When a routing update satisfies the conditions set in the match
criteria, associated actions (permit, deny, or modify) are invoked to
control the propagation of the update. A filter can be applied to
route origination, import and export process. It serves as the low-
level building block for composing arbitrary routing policies. Our BGP
Service Grammar supports the direct use of low-level filters to
specify policy requirements. What we highlight in this section is the
grammar that describes global AS-level properties, rather than that of
an individual filter. These abstractions give network administrators a
set of templates for defining common AS routing policies at high-
level. Low-level filters can then be used for further refinement. We
believe such a design would largely reduce the need for administrators
to go into the low-level configuration details of each filter.
link_to_provider(LocalRouter, RemoteRouter)
-
Meaning: The session between LocalRouter L and RemoteRouter R is a
link to the provider of LocalAS.
-
Diagnosis procedure:
Let P = Get_provider_AS(L.as_num) Get_peer_AS(L.as_num)
Let N L.neighbors corresponding to R
p P
Construct a route update r, let r.as_path="_p_"
if (Routemap_Eval(N.map_out, r) != DENY)
return FALSE;
/* LocalAS shouldn't leak routes learned from providers back to providers */
Figure 5: Link to provider diagnosis
procedure.:
Typical commercial relationships between two neighboring ASes can
be characterized as customer, provider, or peer,
as defined in [11]. To test if a remote AS is a customer (provider, or
peer) of the local AS, we need to verify that the relationship holds
on all sessions between the two ASes. Figure 5 outlines the diagnosis
procedure for link_to_provider. When exporting routers to a
provider, an AS exports its own and its customer routers, but usually
does not export routes learned from providers or peers. A properly
configured export filter on this session should block those routes.
For each provider and peer AS, the diagnosis procedure constructs an
AS path containing the AS number, and feeds it to the export filter.
Any of these paths passing the filter is a violation of the policy. In
that case, the diagnosis procedure fails. This procedure uses several
utilities functions. Get_Provider_AS returns the set of ASes
that are marked as a provider of the local AS. Routemap_Eval
mimic the processing of a policy filter on a route update.
When multiple routes to a remote destination (network or AS)
exist, one link is usually designated as the primary route and others
serve as backup. Such a policy can be expressed with
preferred_outgoing_link. Its diagnosis procedure (Figure 6)
examines the import filter on all EBGP sessions. For each session, the
procedure calculates the local-preference that a route update
to the remote destination would get if it arrives on this session. To
pass the test, the import filter on the preferred session must be the
one that generates the highest local-preference.
preferred_outgoing_link(LocalRouter, RemoteRouter,
RemoteDestination)
-
Meaning: The session between LocalRouter L and RemoteRouter R is the
preferred link for outgoing traffic to RemoteDestination D.
-
Diagnosis procedure:
Let S = Get_EBGP_Sessions(L.as_num)
Construct a route update r, let r.NLRI = D
Let N L.neighbors corresponding to R
Let H = Routemap_Eval(N.map_in, r).local_pref
s S, s {L, R}
if(Routemap_Eval(s.local.map_in, r).local_pref>H)
return FALSE;
/* Another session is more preferable to this one */
Figure 6: Preferred outgoing link
diagnosis procedure.
Both preferred_incoming_link and
preferred_neighbor_entry are used to control incoming traffic
by designating a primary route to a local destination. The difference
is that the latter only concerns two neighboring ASes. The diagnosis
procedures are similar. Both procedures examine export filters, except
that the former looks for the filter that generates the shortest AS
path, while the latter looks for the one that generates the lowest
multi-exit-discriminator (med).
Sample Network Study
We have designed an experimental BGP network consisting of nine
CISCO routers in five ASes, shown in Figure 7. The goal is to
demonstrate different BGP architecture, peering relationship and
routing policies.
Under this setup:
-
AS172 represents a large service provider with four routers, three of
which are BGP speakers. The three BGP speakers form a cluster with
PR3 being the reflector. Therefore IBGP peering between CR3
and CR4 is not required. Inside the AS OSPF is running as IGP.
-
AS160 represents a small customer ISP. It connects to the Internet
solely via AS172, and thus it is a stub.
-
AS180 and AS190 represent two intermediate level service providers.
They subscribe service from AS172 and provide connectivity for AS 200.
They also enter a bilateral peering agreement.
-
AS200 represents a multi-homed customer ISP with 2 BGP speakers. It
has multiple links to AS180 and AS190.
Figure 7:
Experimental setup.
Suppose AS200's network administrator wants to enforce the
following policies:
-
AS200 announces two networks: 200.12.1.0/24 and
200.12.2.0/24
-
AS200 is a multi-homed AS. AS180 and AS190 are its providers.
-
AS190 is the preferred AS for outgoing traffic to AS172.
-
BGP1 is the preferred Border Router for outgoing traffic to AS180.
-
BGP1 is the preferred ingress Border Router for traffic to network
200.12.1.0/24 from AS180.
-
BGP2 is the preferred ingress Border Router for traffic to network
200.12.2.0/24 from AS180.
-
AS180 is the preferred AS for all incoming traffic.
To realize these policies, network administrators first need to
analyze the underlying requirements that support them. Typically,
Policy 1 requires the two networks be originated by the two BGP
speakers. Policy 2 requires outbound filters on every EBGP session
that only allows locally originated routes to be advertised. Policy 3
and 4 require inbound filters to set up the local-preference
attribute correctly. More precisely, routes to AS172 learned from
AS190 should be given a higher local-preference, as should
routes to AS180 learned via BGP1.
!
hostname BGP1
!
router bgp 200
no synchronization
network 200.12.1.0
neighbor 180.200.1.2 remote-as 180
neighbor 180.200.1.2 route-map SETLOCALIN in
neighbor 180.200.1.2 route-map SETMEDOUT out
neighbor 180.200.1.2 filter-list 1 out
neighbor 200.12.2.1 remote-as 200
neighbor 200.12.2.1 update-source Loopback0
!
ip as-path access-list 1 permit ^$
ip as-path access-list 2 permit 180$
!
access-list 1 permit 200.12.1.0 0.0.0.255
route-map SETLOCALIN permit 10
match as-path 2
set local-preference 400
!
route-map SETLOCALIN permit 20
set local-preference 100
!
route-map SETMEDOUT permit 10
match ip address 1
set metric 10
!
route-map SETMEDOUT permit 20
set metric 20
!
ip route 200.12.2.0 255.255.255.0 200.12.3.2
!
Figure 8: BGP1 configuration.
Policy 5 and 6 requires manipulation of the med path
attribute. Both BGP1 and BGP2 advertise network
200.12.1.0/24 and 200.12.2.0/24 to BGP3. BGP1
should give a more favorable med value to network
200.12.1.0/24 than to 200.12.2.0/24, and BGP2 should
do the opposite. BGP3 then is able to decide which router is the
best to reach these networks based on the metric. The med value
should be set by outbound filters of BGP1 and BGP2.
For Policy 7, a common practice is for AS200 to prepend its own AS
number to all updates sending to AS190. This would discourage incoming
traffic from going through AS190 because everything else being equal
BGP will select the route with the shortest AS Path.
Based on these requirements, administrators then choose the
appropriate configuration commands and parameter values for each
router to satisfy them. Figure 8 and Figure 9 give a snapshot of the
working configuration of BGP1 and BGP2.
The configuration errors that can arise include (in fact, we
inadvertently made most of them in setting up our network):
-
Forget to configure the static route to the loopback interface of
BGP1 and BGP2. Neighbor address becomes unreachable. BGP
session could not be established.
-
Forget to specify the update-source in the neighbor
command. TCP connection is rejected and BGP session could not be
established.
-
Forget the AS path access list. AS200 becomes a transit AS of AS180
and AS190.
-
Incorrect route maps and filters due to misunderstanding of the syntax
of regular expression and meaning of path attributes. For example, a
lower med value is considered better, which is in contrast to a
higher local-preference is favored in the route selection
process.
-
For Policies 5 and 6, BGP1 and BGP2 should give, respectively,
more and less favorable values to med. It is entirely possible
that both give equally favorable values.
-
For Policy 7, AS200 should prepend its own AS number in all updates to
AS190, and this rule should be enforced both at BGP1 and BGP2.
If it is forgotten at one router, Policy 7 will not be implemented.
!
hostname BGP2
!
router bgp 200
no synchronization
network 200.12.2.0
neighbor 180.200.2.2 remote-as 180
neighbor 180.200.2.2 route-map SETMEDOUT out
neighbor 180.200.2.2 filter-list 1 out
neighbor 190.200.2.2 remote-as 190
neighbor 190.200.2.2 route-map SETLOCALIN in
neighbor 190.200.2.2 route-map SETASPATH out
neighbor 190.200.2.2 filter-list 1 out
neighbor 200.12.1.1 remote-as 200
neighbor 200.12.1.1 Update-source Loopback0
!
ip as-path access-list 1 permit ^$
ip as-path access-list 2 permit 172$
!
access-list 1 permit 200.12.2.0 0.0.0.255
route-map SETLOCALIN permit 10
match as-path 2
set local-preference 300
!
route-map SETLOCALIN permit 20
set local-preference 100
!
route-map SETMEDOUT permit 10
match ip address 1
set metric 10
!
route-map SETMEDOUT permit 20
set metric 30
!
route-map SETASPATH permit 10
set as-path prepend 200 200
!
ip route 200.12.1.0 255.255.255.0 200.12.3.1
!
Figure 9: BGP2 configuration.
This example shows that manual compilation of high-level
requirements into low-level configuration is a rather demanding and
error-prone process. Even for a small network, the resulting
configuration is already complex. It is not so obvious how each
individual commands relate to intended policies. A large network has
many more routers to manage and much more complicated policies in
place. The service requirement also changes more frequently. It will
be even harder for the administrator to keep the mental map of how
each policy is effected on each individual routers, and make sure
adding or modifying devices and services does not violate existing
requirements.
bgpAS200 :-
AS200basicConnectivity, AS200policies.
AS200basicConnectivity :-
ibgp_session(BGP1, BGP2, AS200),
ebgp_session(BGP1, AS200, BGP3, AS180),
ebgp_session(BGP2, AS200, BGP3, AS180),
ebgp_session(BGP2, AS200, BGP4, AS190),
as_full_mesh({}, {BGP1, BGP2}, AS200).
AS200policies :-
policy1, policy2, policy3, policy4, policy5, policy6, policy7.
policy1 :-
route_originate({200.12.1.0/24, 200.12.2.0/24}, AS200).
policy2 :-
provider_AS(AS200,AS180),
link_to_provider(BGP1, BGP3),
link_to_provider(BGP2, BGP3),
provider_AS(AS200,AS190),
link_to_provider([BGP2, BGP4).
policy3 :-
preferred_outgoing_link(BGP2, BGP4, AS172).
policy4 :-
preferred_outgoing_link(BGP1, BGP3, AS180).
policy5 :-
preferred_neighbor_entry(BGP1, BGP3, 200.12.1.0/24).
policy6 :-
preferred_neighbor_entry(BGP2, BGP3, 200.12.2.0/24).
policy7 :-
preferred_incoming_link(BGP1, BGP3, ALL), preferred_incoming_link(BGP2, BGP3, ALL).
preferred_incoming_link(BGP2, BGP3, ALL), preferred_incoming_link(BGP2, BGP3, ALL).
Table 2: AS200 service grammar.
Using Service Grammar, the global requirements of AS200 can be
described as shown in Table 2.
The description is concise and hides most low-level details. More
importantly, it highlights the constraints spanning multiple routers
that have to be enforced. Keeping track of such global constraints in
low-level configuration language would be much harder. The diagnosis
engine can effectively identify the configuration errors listed above. For
instance, the first two errors will result in ibgp_session_test
to fail. Missing AS path access list can be detected by
link_to_provider. Similarly, misconfigured route-maps are caught by
policy grammar rules.
Related Work
Our system provides a language to express the logical structure of
an AS and its BGP policies. It and can automatically check expressions
in this language against router configurations and thereby provide a
useful diagnosis service, which has not been available to date. The
main difference between this approach and previous diagnosis systems
is that we can describe the BGP architecture of an AS in a high level
language and check that it has been correctly configured. If someone
changes a router configuration, he can just run the diagnosis again to
ensure that the logical structure and policies have not been violated.
In Netsys [8], there is no way to describe the administrator's
intention, i.e., network-specific policies and
structure. It just runs a collection of network-invariant tests.
Routing Policy Specification Language (RPSL) [12] allows a network
operator to specify routing policies in a high-level language. Given
sufficient details, low-level router configurations can be generated
from the description. RPSL shares some common views with our approach.
We feel Service Grammar is better suited for diagnosing configuration
errors because of its expressive power. In the short run, we believe
diagnosis is even more important than provisioning because it gives
administrators the desirable level of control and predictability, and
can be used immediately.
It is natural to extend our system for provisioning. Given a
comprehensive service specification, it can be compiled into vendor-
neutral component configurations. Vendor-specific adaptors can then be
applied to generate low-level router configuration commands. Reference
[4] demonstrates a system that implements Service Grammar rules in
Prolog. Because of the relational nature of Prolog, service
specification simultaneously serves provisioning purposes. Another
system that performs a set of specialized provisioning tasks is
described in [13].
Reference [14] studies IBGP routing anomalies and proposes
sufficient conditions that guarantee correctness. These conditions can
be incorporated into our Service Grammar as policy templates.
Reference [10] presents an empirical study of BGP misconfigurations.
Several classes of errors, such as reliance on upstream filtering,
forgotten filter, incorrect summary, bad route map, etc., are caused
by simple high-level policies that are not obvious for operators to
express at the CLI level. Our system would reduce these types of
errors given the high-level, system-wide requirements.
Summary
Our Service Grammar system enables a new way of configuration
error diagnosis in a distributed environment. Network administrators
can express the global requirements in a high-level language. The
diagnosis engine then systematically verifies if the expressions in
this language are satisfied by device configurations in a top-down
fashion. Our system highlights global reasoning - i.e., why a group of
components fail to jointly compose the intended service - rather than
why a single component fails.
We demonstrate how to use such a system to diagnose BGP
configuration errors in an AS. Previous solutions to diagnosing
configuration errors have been network invariant in that they contain
a fixed set of constraints that must be satisfied by every BGP
network. However, BGP requirements such as logical architecture and
policies differ widely from one network to another. It is not possible
in previous techniques to check if component configurations implement
the network-specific requirements. Our language consists of a small
set of abstractions that can be composed to describe most BGP
features.
One limitation of this approach is that Service Grammars and
diagnosis procedures must be developed for each protocol of interest.
We have prototyped Service Grammars for RIP, OSPF, BGP, BGP/MPLS, PIM,
GRE, IPSEC, DiffServ and the Spread group communication protocol [15].
The difference between grammars tends to be significant because they
are very protocol-specific. Based on our experience, the amount of
effort required to develop the Service Grammar for a protocol is not
terribly large. The notion of correct configuration is already
implicit in the definition of protocols since their intended use is a
part of the definition. The job of the Service Grammar designer is
essentially to make this "configuration logic" explicit by analyzing
these definitions.
The challenge for end users is that they need to go through
another learning curve, and may still write incorrect specifications
in this language. However, we believe the chances of errors should be
lowered in this high-level language as oppose to low-level
configuration commands.
Author Information
Xiaohu Qie is a Ph.D. candidate in the Department of Computer
Science at Princeton University. His research interests cover the
general areas of operating systems, networking and security. He
received his B.S. degree from Tsinghua University, Beijing, China, in
1998, and M.A. degree from Princeton University, Princeton, NJ in
2000. He is a student member of USENIX, IEEE, and ACM SIGOPS. He can
be reached at qiexh@CS.Princeton.EDU.
Sanjai Narain is a Senior Research Scientist in the Information
Assurance and Security Department in Telcordia Technologies' Applied
Research Area. He developed the Service Grammar technique that makes
systems integration via configuration much more efficient than
possible today. This technique has been used in DARPA projects on
synthesis of dynamic coalitions and secure, survivable information
systems. It has also been integrated into a Telcordia operations
support system for IP Virtual Private Networks. In 2003, the dynamic
coalitions project won a DARPA award for technology transfer to the
Army's Future Combat Systems program. Earlier, he developed the DR.
DIALUP software for reducing help-desk costs of Internet Service
Providers, a DSL loop qualification tool that is the basis for a
successful Telcordia service, and a SONET conformance-testing tool
that was used by Telcordia's professional services. Prior to joining
Telcordia, he designed logic-based simulation techniques at The RAND
Corporation. His background is networking, symbolic logic, automated
reasoning and programming languages. He obtained a Ph.D. in Computer
Science from University of California, Los Angeles, in 1988, an M.S.
in Computer Science from Syracuse University, in 1981, and a B.Tech in
Electrical Engineering from Indian Institute of Technology, New Delhi,
in 1979. He can be reached at narain@research.telcordia.com.
References
[1] Lampson, Butler, "Computer Security in the Real World,"
Proceedings of Annual Computer Security Applications
Conference, 2000.
[2] Horn, Paul, Autonomic Computing: IBM's Perspective on the
State of Information Technology,
https://www.research.ibm.com/autonomic/manifesto/autonomic_computing.pdf.
[3] Barton, M., D. Atkins, S. Narain, D. Ritcherson, K. Tepe,
"Integration of IP Mobility and Security For Secure Wireless
Communications," Proceedings of IEEE International Communications
Conference, New York, NY, 2002.
[4] Narain, S., B. Coan, V. Kaul, K. Parmeswaran, W. Stephens,
"Building Autonomic Systems Via Configuration," To appear in
Proceedings of IEEE Autonomic Computing Workshop, June, 2003.
[5] Narain, S., A. Shareef, M. Rangadurai, "Diagnosing
Configuration Errors in Virtual Private Networks," Proceedings of
IEEE International Communications Conference, Helsinki, Finland,
2001.
[6] Narain, S., R. Vaidyanathan, S. Moyer, W. Stephens, K.
Parmeswaran, A. Shareef, "Middleware for Building Adaptive Systems
Via Configuration," Proceedings of ACM SIGPLAN Workshop on
Optimizing Middleware, Salt Lake City, UT, June, 2001.
[7] Rekhter, Y. and T. Li, "A Border Gateway Protocol 4
(BGP-4)," Request for Comments 1771, March 1995.
[8] Netsys, https://www.cisco.com/warp/public/cc/pd/nemnsw/nesvmn/index.shtml.
[9] BGP Commands, https://www.cisco.com/univercd/cc/td/doc/product/software/ios120/12cgcr/np1_r/ 1rprt1/1rbgp.htm.
[10] Mahajan, Ratul, David Wetherall, and Tom Anderson,
"Understanding BGP Misconfigurations," Proceedings of ACM
SIGCOMM, August, 2002.
[11] Gao, L. and J. Rexford, "Stable Internet Routing Without
Global Coordination," Proceedings of ACM SIGMETRICS, June,
2000.
[12] Alaettinoglu, C., C. Villamizar, E. Gerich, D. Kessens, D.
Meyer, T. Bates, D. Karrenberg, and M. Terpstra, "Routing Policy
Specification language (RPSL)," Request for Comments 2622,
June, 1999.
[13] Gottlieb, Joel, Albert Greenberg, Jennifer Rexford, and Jia
Wang, "Automated provisioning of BGP customers," In submission,
December, 2002.
[14] Griffin, T. G. and G. Wilfong, "On the Correctness of IBGP
Configuration," Proceedings of ACM SIGCOMM, August, 2002.
[15] The Spread Toolkit, https://www.spread.org/.
|