![]() ![]() | ||||||||||||||
|
![]() |
2.2 Application-directed route selectionWe use Figure 1 to highlight a specific problem introduced by BGP's hot-potato routing thereby motivating the need to enhance route selection with fine-grained application-directed route control. We assume that the customer shown in Figure 1 has multihomed to the provider network for the purpose of improving redundancy, and so the same destinations are reachable via both links. All PEs in the provider network therefore have two possible routes to reach the customer network. Assume further that most of the traffic destined to the customer network enters the provider network from the peering ISP network. Assuming unit IGP costs for each internal provider link in Figure 1(a), the two ingress PEs at the top of the figure prefer the route via the top egress PE connected to the customer network (Figure 1(c)). This leads to an imbalance in the load on the two egress links, with the top egress link carrying all (or most) of the traffic, which in turn may result in congestion. In practice the customer or ISP may prefer the ISP to load-balance traffic on the two egress links while still considering the IGP path cost between ingress and egress. In IRSCP this goal is achieved by basing the decision process on the input from a load-balancing route control application run by the ISP, which takes into account IGP path cost, as well as "offered" ingress load and capacity of egress links. The route control application directly controls route selection for each PE and, in this example, directs the two ingress PEs to each send traffic to a different egress PE. We emphasize that this routing solution cannot be achieved in BGP without manipulating BGP attributes on multiple PEs through vendor-specific configuration languages. In contrast, IRSCP exports a simple, intuitive interface that allows a route control application to supply a ranking of egress links that is evaluated during execution of a modified decision process, the explicitly ranked decision process. The interface effectively isolates complexity and intelligence in the route control application, while IRSCP itself remains simple. As shown in Table 1, the explicitly ranked decision process replaces the hot-potato steps (B-5 and B-6).3 Scalable route control![]() 3.1 Explicitly ranked decision processIRSCP implements two types of decision process: the normal BGP decision process and the explicitly ranked decision process. Both perform route selection on behalf of individual PEs and so are defined on a per-PE basis. The BGP decision process is used for the subset of destinations, unranked prefixes, for which the customer, ISP or route control application has determined that conventional hot-potato routing can be used. For the remaining ranked prefixes, the route control application creates a preference ranking of egress routes for each ingress router, the selection of which is realized in IRSCP by the explicitly ranked decision process. In our architecture the route control application tells IRSCP which routers should use which egress routes based on routing information, traffic load measurements, etc. As a general principle IRSCP follows the directives of the application, except when doing so severely impairs connectivity. An instance of this principle is that we let the application specify a ranking of egress links, i.e., egress links ranked by preference, rather than a fixed assignment of egress links to routers. (Each egress route corresponds to only one egress link, and we therefore use the terms egress link and egress route interchangeably.) Using a ranking accommodates unavailability of egress routes. For example, if the top-ranked egress route is unavailable, the next-ranked egress route may be selected. The application specifies the ranking on a per-destination, per-router basis. We construct our decision process for ranked prefixes (Table 1) by adopting Steps 0-4 of the BGP decision process and then apply the explicit ranking instead of performing hot-potato routing, followed by a tie-breaker in Step R-6. This ensures that the explicitly ranked decision process respects BGP attributes such as AS path length (Step 2) and that it takes reachability of egress routers into account. In principle, the explicit ranking can be applied at any point in the decision process, e.g., it may override earlier steps of the decision process. However, we leave exploring the extent to which we may safely override or replace BGP attributes to future work. As we explain in Section 3.2.4, we specifically do not include a step based on IGP distance (Step B-6). We explore an example of the explicitly ranked decision process by considering the scenarios shown in Figures 2(a) and (b). In this example a single IRSCP server runs the decision process for every PE in the ISP's network. We examine the execution of the decision process for PE A in Figure 2(a). First the IRSCP server receives all routes for the given prefix: E-C, F-C and G-D. (We refer to each route using its egress ID, the pair of (CE,PE) routers incident on the egress link for the route.) Next, the explicitly ranked decision process for PE A executes Steps 0-4 and in Step 2 eliminates egress route F-C based on the longer AS path length. (We assume that the routes otherwise have identical BGP attributes.) The result is the egress set {E-C,G-D}. In Step R-5 the decision process applies the explicit ranking for PE A to the egress set. Since the top-ranked egress link E-C is present in the egress set, the decision process selects this route for PE A. Similarly, the decision process selects route E-C for PE C, and route G-D for PEs B and D, resulting in the forwarding behavior shown in Figure 2(b). An important observation is that Steps 0-4 are identical for all PEs. Therefore the decision process for any PE computes the same egress set.3.1.1 Outdated rankingsIdeally, the input from the application to IRSCP continuously reflects the current state of the network. In practice, however, IRSCP, being an active participant in BGP, is in a better position to respond instantly to changes in routing state such as BGP route attributes (using Steps 0-4 of the decision process), IGP distances (discussed in Section 3.2.4), and the availability of BGP routes (discussed below). IRSCP must therefore adapt the rankings from the application (based on possibly outdated routing information) to current routing information, as follows. Between the time that the application sends the rankings to IRSCP and the time that the explicitly ranked decision process runs, new egress routes may be announced and old routes may be withdrawn. Until the application updates its rankings, IRSCP must accommodate discrepancies between the available routes assumed when the application creates the rankings and the actual available routes. An instance of a withdrawn egress route is illustrated in Figures 2(c) and (d), in which CE E withdraws egress route E-C, and the egress set changes to {G-D}. As a result, the decision process changes its selection for PEs A and C to G-D and all traffic egresses through PE D (Figure 2(d)). In other words, a ranking specifies not only the desired routing for the PE in the absence of failure, but also the desired fail-over behavior that the PE should adopt. Conversely, if new egress routes are advertised, IRSCP appends them to the end of the explicit ranking (in order of egress ID) until the application is able to provide a revised ranking (Steps R-5 and R-6). Alternatively, the application may prevent IRSCP from appending routes in this manner. For example, the application may wish to restrict the set of egress routes of a particular customer to a fixed set, thereby preventing some forms of prefix hijacking. We define a "virtual" black-hole egress route, which is part of every egress set and (conceptually) sinks traffic directed to it. We also define a corresponding black-hole egress ID, which an application can include as part of a PE's ranking. If the explicitly ranked decision process for a PE selects the black-hole egress route, the IRSCP server does not send a route to the PE (or its attached CEs), thus making the destination unavailable through that PE.3.1.2 Other IRSCP applicationsBy opening up the decision process to external input, IRSCP enables a class of applications in which routing is controlled based on information external to BGP. An example of an application that uses external information (or in this case analysis) is presented in [18]. The authors propose a pragmatic approach to BGP security by which suspicious routes are quarantined for a certain period of time before being considered for selection. In this case knowledge about the historical availability of routes (even if the routes were not selected) is important to determine whether a route is potentially hijacked. Further, IRSCP provides a mechanism (the black-hole egress route) by which routers can be prevented from selecting suspicious routes until deemed safe by the application. Another example in this category is the load-balancing application described above, which makes use of network conditions inside the IRSCP-enabled network to inform route selection. However, it is also possible to inform route selection with network conditions external to the IRSCP-enabled network. For example, Duffield et al. [11] explore the possibility of using measured path conditions to select between various alternate paths leading to the same destination. This allows the network to route traffic that is sensitive to adverse network conditions along paths more favorable than those that default BGP would be capable of finding (to the extent permitted by the policies encoded in BGP attributes). The complete route visibility afforded by IRSCP also simplifies a number of route monitoring applications. For example, auditing applications that ensure that peers abide by peering agreements require complete visibility of the routes being advertised to a network [23]. Similarly, "what-if" network analysis [13] becomes much simpler if the application is aware of all the routes that were available. IRSCP's ability to use external input to inform route selection, however, is its key advantage.3.2 IRSCP architecture![]() 3.2.1 DistributionThough logically centralized from a route control application viewpoint, IRSCP is implemented as a distributed system-consisting of multiple IRSCP servers-to address fault-tolerance and scalability requirements. If we designed IRSCP as a single centralized server, failure or partitioning away of that server would leave every PE in the network steerless and unable to forward traffic correctly, since in BGP a session failure implicitly causes BGP routes announced through the session to be withdrawn. In IRSCP we can tolerate the failure of an IRSCP server by letting routers peer with multiple IRSCP servers. Furthermore, a centralized IRSCP faces a number of (overlapping) scalability challenges. First, a large Tier-1 ISP's routers collectively maintain many thousands of BGP sessions with routers in neighboring networks, something that no current BGP implementation is able to support by itself. Second, IRSCP makes routing decisions for the hundreds of PEs within the ISP. Third, IRSCP must store the combined BGP routes received from CEs and originated by the network, and it must process updates to these routes. As shown in Figure 3, we choose to partition the IRSCP workload by letting each IRSCP server peer with a subset of PEs and CEs, thereby addressing the first two scalability concerns. Our architecture still requires each IRSCP server to store all BGP routes and process the corresponding updates; however, we show in Section 5 that this aspect of scalability does not pose a severe problem. In performing the per-PE decision process, an IRSCP server needs to take into account the position of that PE in the IGP topology. To maintain consistency of IGP routing state, each IRSCP server runs an IGP viewer and has the same global view of the IGP topology [6]. Due to the partitioning of work in our distributed architecture, each IRSCP server needs to perform shortest-path calculations only for the set of PEs for which it makes BGP routing decisions rather than for the network as a whole. The IRSCP servers further ensure consistency by exchanging all BGP updates among each other. Comparing this solution with BGP route reflection [4], the most important difference is that a route reflector selects a single route as best route for each destination (using the "normal" BGP decision process) and only makes that route available to other routers and route reflectors. As a result different routers and route reflectors may observe a different set of available routes, which in turn has led to non-deterministic or divergent behavior in BGP, e.g., "MED oscillation" [15,21]. Basu et al. [3] show that exchanging all routes selected by Steps 0-4 of the decision process is sufficient to prevent non-determinism and divergence in iBGP; IRSCP exchanges a superset. Thus we propose a simple and elegant solution to distribution: a full mesh in which all IRSCP servers distribute all routes. Architecturally, this solution is similar to an iBGP infrastructure in which BGP routers are fully meshed, an architecture that was eventually replaced by route reflectors due to scalability problems. However, there are two differences between the two architectures. First, the number of IRSCP servers is small compared with the number of routers. Second, being based on commodity technology rather than router technology, we expect IRSCP servers to keep better pace with the technology curve than routers have [17].3.2.2 IRSCP protocol and RIB![]() 3.2.3 Ensuring consistency![]() 3.2.4 IGP reachabilityWe assume that the application has taken IGP distances into account when it creates the ranking. Although the IRSCP decision process could conceivably re-rank egress links in response to IGP distances, it generally does not do so for several reasons. First, for applications such as load balancing customer traffic, strict adherence to a shortest-path policy appears to be of secondary importance. Indeed, tracking IGP distance changes can have adverse effects, such as causing large volumes of traffic to shift inadvertently [28]. The explicit ranking provided by an application introduces a degree of stability, effectively "pinning" routes. If it is necessary to respond to IGP changes, we require the application to do so by providing an updated ranking. Teixeira et al. [27] suggest that in a large ISP with sufficient path diversity in its IGP topology the latency of MPLS tunnels is not greatly affected by IGP changes. For these cases, route pinning does not sacrifice much performance in terms of latency.![]() 4 ImplementationA significant part of the IRSCP server functionality is identical to that of a BGP router. We therefore based our prototype implementation on the version 3.9 code base of openbgpd.![]() 4.1 Explicit application rankingAs shown in Figure 7, route control applications provide rankings for the IRSCP's explicitly ranked decision process. In our prototype they do so through the IRSCP configuration file, which contains a number of rank statements, one for each set of prefixes that are ranked identically. The following is an example of a rank statement for two prefixes.rank { prefix { 3.0.0.0/8, 4.0.0.0/8 } pe { 1.2.3.4, 5.6.7.8 } egresses { 11.12.13.14:15.16.17.18, 19.20.21.22:23.24.25.26 } pe { 101.102.103.104 } egresses { 19.20.21.22:23.24.25.26,blackhole } } The pe statements of a rank statement must contain every PE attached to the IRSCP server. Recall that in IRSCP each route carries an egress ID, specifying what egress link traffic for the destination of the route is to use when it exits the network (assuming the route is selected as best route). The egresses statement is the ranking of egress IDs to be used for the PEs in the preceding pe statement. Each egress ID consists of the IP addresses of the CE and PE incident on the egress link. In addition, the blackhole egress may be specified. Application rankings are stored in a per-PE Red-Black Tree of destination prefixes, where each prefix points to the ranked list of egress IDs for that PE and prefix. When a new ranking set arrives, the IRSCP server updates its ranking tree and reruns the decision process for affected destinations. ![]() ![]() 4.2 Decision process and IRSCP RIBThe data structures used to implement the RIB (IRSCP RIB in Figure 7) are adapted from openbgpd's implementation of the BGP RIB. openbgpd provides various indexes into the BGP RIB, the most important of which is a Red-Black Tree of destination prefixes, where each entry points to a list of routes (one route for each neighbor). To simplify the implementation we maintain this structure, despite the fact that the number of routes per prefix in IRSCP increases to one route per egress link. Our performance evaluation in Section 5 shows that the resulting increase in search time for a route in the RIB is not a great concern. openbgpd maintains the per-destination list of routes in order of preference, based on pairwise comparison of the BGP attributes of routes according to the BGP decision process steps shown in Table 1. Thus each route update is linear in the number of routes for the destination, and the decision process itself amounts to no more than checking whether the egress router for the route at the head of the list is reachable (Step 0). However, there are two problems with this algorithm for IRSCP. First, pairwise comparison of the MED value (Step 4) produces results that may be incorrect and, furthermore, dependent on the order in which the routes were inserted into the RIB, which in turn leads to potential inconsistency among the RIBs in different IRSCP servers. The underlying problem is that MED, being comparable only between routes from the same neighboring network, does not follow the "rule of independent ranking" [15]. The second problem is that IRSCP defines a per-PE decision process, resulting in a different order of routes for different PEs. However, we note that Steps 0-4 of the (BGP or explicitly ranked) decision process (which compute the egress set) are independent of the PE, hence we compute these steps once for all PEs (decision process steps 0-4 in Figure 7), storing the result in the RIB. Our implementation orders routes in the RIB using pairwise comparison based on Steps 0-3. Next, of the routes ranked highest so far, it examines each set of routes received from the same neighboring network and sets a flag on those that have highest MED value in such a set, thereby computing the egress set (Step 4). The algorithm used for Step 4 is similar to that used by the Quagga open source BGP stack (www.quagga.net) and runs in O(n2) time for n routes available for a prefix. Next, the PE-specific steps of the decision process are executed using a pairwise comparison of routes in the egress set. In the case of the BGP decision process (BGP DP steps B5-B7 in Figure 7), we break the tie in Step B-7 based on the egress ID of the route rather than the router ID (of the BGP router or IRSCP server that sent the route), since a neighboring IRSCP server may have sent multiple routes for the destination. In the case of the explicitly ranked decision process (expl. ranked DP steps R5-R6 in Figure 7), the IRSCP server adds a black-hole route (with lowest possible egress ID) to the egress set and then retrieves the list of ranked egress IDs for the PE and destination (Section 4.1). When comparing the egress IDs of a pair of routes in the egress set, our implementation simply walks down the list of egress IDs until it finds the higher ranked of the two egress IDs. This runs in time linear in the number of routes times the size of the ranking for a prefix and PE, or, if all n prefix's routes appear in its ranking, in O(n2) time. Following (re-)execution of the decision process for a given PE, the IRSCP server distributes its decision to the PE and to all CEs attached to the PE. Note that the IRSCP server does not need to run a separate decision process for a CE: as in BGP, the route sent to the CE is the same as is selected for the associated PE. To prevent unnecessary updates from being sent to neighbors, BGP implementations typically remember the last route selected for each destination as the active route. Our IRSCP server implementation uses the same technique; however, instead of storing a single active route, it stores an active route for each attached PE. To evaluate scaling of the decision process in terms of the number of routing policies, we should consider not only the time needed to evaluate the ranking of a prefix (linear time, see above), but also the time needed to look up the ranking. Retrieving a ranking runs in time logarithmic in the number of ranked prefixes and (in our implementation) linear in the number of PEs per IRSCP server. This is similar to the time needed by a BGP implementation to look up a prefix in the routing table and retrieving peer state before sending updates to a peer.4.3 Egress managementBased on the egress ID specified in each IRSCP route, IRSCP has to ensure that (a) BGP routers are "instructed" to forward the traffic towards and through the indicated egress link, and (b) the route is only used if the egress link is available. To implement (a) we use the nexthop BGP attribute of a route. This attribute tells a router to which nexthop router it must forward traffic when it uses the route. The IRSCP server sets the nexthop attribute (set nexthop in Figure 7) when it sends a route to a PE or CE in such a way that traffic passes through the egress link and uses the egress ID to determine what that nexthop should be. For example in Figure 2(a) and (b), IRSCP determines from egress ID E-C that PE A must use PE C as nexthop, and PE C must use CE E as nexthop. In addition, a CE attached to PE A (not shown) is sent PE A as nexthop. The latter is not determined based on egress ID. Rather, as we discuss next, an IRSCP server associates each CE with a PE, and it is this PE that IRSCP sets as nexthop when sending to the CE. When an IRSCP server is configured to form an eBGP session with a CE, part of the configuration identifies a PE with which to associate the CE: the PE to which the CE is physically attached through an egress link (e.g., see Figure 3(b)). Apart from setting the nexthop router for routes sent to the CE (as discussed above), the IRSCP server uses the identity of the PE and CE to set the egress ID (add egress ID in Figure 7) on routes received from the CE.5 EvaluationIn this section we first present a functional evaluation, demonstrating that an example load-balancing application is able to effect fine-grained route control using the ranking abstraction and our prototype IRSCP implementation. We then evaluate the scalability and performance of our prototype in processing BGP updates.5.1 Functional evaluationWe verified the functionality of IRSCP with a testbed consisting of an IRSCP server, a load-balancing route control application, three ingress PEs, (Ingress1, Ingress2, and Ingress3), two egress PEs, (Egress1 and Egress2), a core router, and a number of hosts arranged into a "source" network and a "destination" network. The core router uses point-to-point links to connect to the PEs and we use GRE tunnels as the tunneling technology between the PEs. We assume a simple scenario in which the egress PEs attach to a customer who prefers its incoming traffic to be balanced on the egress links. The load-balancing application performs SNMP GET queries on each PE to collect offered and egress loads. Based on the offered load, the application computes a ranking set that balances the load on the egress links. If the ranking set changes, the application generates an updated configuration and issues a configuration reload command. The hosts in the source network send constant rate UDP packets to the hosts in the destination network. Figure 8(a) shows the offered load at the ingress PEs as measured by the application server. The load at the egress PEs is shown in Figure 8(b). Before t=26, the load at these two egresses is "balanced" as a result of the initial ranking set at the IRSCP (i.e., Ingress1 prefers Egress2, and Ingress2 prefers Egress1). At t=26, the offered load at Ingress3 increases, and the application takes up to 30 seconds (the SNMP polling interval) to detect this change. It then realizes that the best way to balance the load is to send the load from Egress1 and Egress2 to one egress point, and the load from Egress3 to another. Therefore, it generates a new ranking set and sends it to the IRSCP server, which updates the ingress PEs. As a result, the load from Ingress3 goes to Egress2, and the load from Ingress1 and Ingress2 goes to Egress1. Similarly, at t=47 the application generates a new ranking set and shifts Ingress1 and Ingress2's loads to different egresses.![]() ![]() ![]() 5.2 Performance measurement testbed5.3 Performance measurement testbedRather than emulating an entire ISP infrastructure, we run our tests on a single IRSCP server, loading it as though it were part of a hypothetical Tier-1 ISP network. Note that there is limited value in emulating an IRSCP-based infrastructure for the purpose of performance evaluation. Due to the fact that IRSCP servers exchange all routes, routing in an IRSCP infrastructure does not oscillate like it does in a BGP-based infrastructure [3]. Therefore convergence time of an update is governed solely by communication latency and the processing latency of an update in an IRSCP server or a BGP router. Figure 10(a) shows the Tier-1 ISP network modeled. Each of the npop POPs (Point-of-Presence, e.g., a city) contains one IRSCP server and all IRSCP servers are fully meshed. Assuming IRSCP as a whole processes a combined BGP update rate rateall from all CEs, then on average an IRSCP server sends roughly rateall/npop to each IRSCP server, namely the share of updates that it receives from CEs in its own POP. In other words, each IRSCP server sends and receives at a rate of about [(npop-1)/(npop)]·rateall » rateall on its combined IRSCP sessions (Figure 10(b)). The update rate that an IRSCP server sends to a PE or CE is governed by the output of best route selection based on the (BGP or explicitly ranked) decision process. We make the simplifying assumption that this rate ratebest is the same for all PEs and CEs. From measurements taken in a Tier-1 ISP network we derive highly conservative ball park estimates for rateall and ratebest [30]. These estimates are based on the busiest day of the past year in May 2006 (in terms of total number of updates received by a route collector in the ISP's network on a day). We divide the day into 15-minute intervals and use the average of each interval to derive the estimates for that interval. From the busiest 15-minute interval we deduce an estimate for rateall and ratebest of about 4900 and 190 updates/s, respectively. The 95th percentile interval gives a significantly lower estimate of about 600 and 24 updates/s, respectively.![]() 5.4 ThroughputWe determine the maximum value of input rate (rateall) that the IRSCP server can sustain for an extended period of time, as follows. To discover the maximum input rate, we gradually increase the input rate and compare the observed output rate with an expected output rate of rateall/26 ·(nirscp + nibgp + nebgp). When the input rate is below its maximum, the output rate corresponds to the expected output rate. Once the input rate exceeds its maximum, the output rate steadily declines with increasing input rate. Using this procedure we compare the performance of the standard BGP decision process with the explicitly ranked decision process, and evaluate the impact of the per-PE decision process. For the standard BGP decision process we find a maximum rateall » 3600 updates/s, corresponding to an expected output rate of 40,846 updates/s. Figures 9(a) and (b) (BGP DP, 15 PEs) show the observed input and output rates during a run lasting twenty minutes, averaged over 30-second intervals. While the output rate is not as stable as the input rate, on average it corresponds to the expected output rate, and the figure shows that it is sustainable. Next, we load explicit rankings for 60,000 prefixes, each listing 26 egress IDs in some arbitrary order. In this case we find a maximum rateall » 2400 updates/s, corresponding to an expected output rate of 27,231 updates/s. As expected, the explicitly ranked decision process is slower than the BGP decision process, since it runs in time quadratic rather than linear in the number of routes per prefix. Again, Figure 9 (ranked DP, 15 PEs) shows that the IRSCP server can sustain this workload. Finally, to evaluate the impact of the per-PE decision process (vs. a single, router-wide decision process), we run an experiment again based on the BGP decision process, but in which all but one of the iBGP sessions between the IRSCP server and the update receiver are replaced by an eBGP session. In this case we find that the IRSCP server sustains a maximum rateall » 4200 updates/s and produces the expected output rate of 47,654 updates/s (BGP DP, 1 PE in Figure 9). Figure 9(c) plots rateall for a varying number of PEs (and using the BGP decision process), but evaluated by sustaining the rate for only 40 seconds, rather than 20 minutes. While the sustained throughputs are less than our maximum measurement-based estimate for rateall of 4900 updates/s, the IRSCP server easily manages to keep up with our 95th percentile estimate of 600 updates/s in all experiments, even when increasing the number of PEs to 50. Hence, we expect that an improved implementation of flow control in the IRSCP server should sufficiently slow down senders during times that the offered load exceeds the maximum sustainable throughput.5.5 Memory consumptionWe evaluate memory usage of an IRSCP server as a function of the number of routes it stores. We use a subset of the setup in Figure 10(c): the IRSCP server under test and the update generator. Also in this experiment we add BGP attributes from a routing table in the ISP network from October 2006. We vary the number of sessions between the update generator and the IRSCP server from 0 to 20 and measure the memory usage of the IRSCP server's RIB. We find a linear increase from 18.2 MB (0 sessions) to 226 MB. Next, we examine memory usage of the explicit rankings. We configure an IRSCP server with rankings for 15 PEs and vary the number of ranked prefixes from 0 to 130,000 (but without loading their routes). Each ranking for a prefix and PE consists of 26 egress IDs. We measure the process size of openbgpd's route decision engine, which stores the rankings. Again we find a linear increase from 840 KB (0 prefixes) to 994 MB (130,000 prefixes). Our current implementation is not optimized for space allocation, and, as a result cannot store rankings of this size for more than 130,000 prefixes. After we load 26 copies of the routing table, our prototype supports rankings for up to 75,000 prefixes on our test machines. However, we expect 26 egress IDs per prefix to be vastly more than needed in practice; five egress IDs per prefix can be stored for all 234,000 prefixes with full routing tables loaded.6 Related workIRSCP extends our previous work on route control architectures [6,12,29] in three important ways. First, we introduce a modified BGP decision process which we expose to route control applications. Second, IRSCP has full visibility of all routes available to the network through its eBGP interaction with neighboring networks. This is in contrast to a phase-one, iBGP-only RCP [6], where routers in the IRSCP-enabled network only pass selected routes to IRSCP. Having full route visibility prevents route oscillations within the network [3,15,21] and simplifies route control applications. Third, IRSCP distributes the route selection functionality, whereas the simple server replication presented in [6] required each replica to scale to accommodate the entire network. Bonaventure et al. [5] propose sophisticated route reflectors but limit their changes to the iBGP infrastructure. The 4D project proposes a refactoring of the network architecture, creating a logically centralized control plane separate from forwarding elements [14,32]. The IETF ForCES working group examines a separation of control plane and forwarding plane functions in IP network elements [19]. This work has a much narrower focus on defining the interface between control and forwarding elements, without considering interaction between different control plane elements. Once ForCES-enabled routers become available, IRSCP's iBGP communication with local routers might conceivably be replaced by a ForCES protocol. An earlier IETF proposal [10,25] provides limited route control by defining a new BGP attribute subtype, the cost community, which can be assigned to routes and used to break ties at a certain "point of insertion" in the BGP decision process. This proposal does not indicate under what conditions the cost community would be safe to use; by contrast, we show how our rankings should be constrained to ensure consistency.7 ConclusionThe ultimate success of a logically centralized control plane architecture will depend not only on its ability to enable new functionality, but also on its ability to provide a scalable and robust routing function to large and growing provider networks. We address each of these points by presenting a distributed realization of the Intelligent Route Service Control Point (IRSCP) that partitions work between instances and allows redundancy requirements to drive the extent to which the system is replicated. We also move beyond BGP capabilities by allowing route control applications to directly influence the route selection process by providing a ranking of egress links on a per-destination and per-router basis. We demonstrate the utility of this change with a simple load-balancing application. As future work, we plan to investigate to what extent an IRSCP that has complete control and visibility allows a simplification of the expression and management of conventional routing policies. Acknowledgments. We would like to thank Yuvraj Agarwal, Michelle Panik, Barath Raghavan, Rangarajan Vasudevan, Michael Vrable and the anonymous reviewers for their helpful comments on previous versions of the paper. This work was supported in part by the National Science Foundation through grant CNS-0347949.References
|
![]() |
||||||||||||||||||||||||
![]() | ||||||||||||||||||||||||||
![]() |
Last changed: 31 May 2007 ch |