Check out the new USENIX Web site. next up previous
Next: Probe control Up: PCP Design Previous: Emulating Request-and-Set

Rate compensation

A key challenge for PCP is to gracefully eliminate queues that might build up at a bottleneck router. Queues can build up for several reasons. One obvious cause is failed probes. If all of the bottleneck bandwidth has been allocated, any additional probe will induce queueing that will not disappear until some flow reduces its bandwidth. As more failed probes accumulate, the queues could slowly build to the point where packet loss is inevitable. A more severe cause is due to the time lag between when an end host makes a probe and when it can allocate the bandwidth. In PCP, the request and set operations are not atomic. If two or more hosts send a probe at approximately the same time, both probes may succeed, resulting in duplicate allocation of the same bandwidth. In this case, the link may be over committed, and unless one or both hosts reduce their rates, queues will quickly build up to cause packet loss.

Fortunately, it is straightforward for PCP to detect queueing at the bottleneck. Recall that once an end host allocates bandwidth, it sends its data as paced packets at the base rate. If there is no queueing at the bottleneck, the delays for these data packets will be regular. Any increase in the delay indicates a queue, requiring a rate reduction to eliminate. Note that complex link layer arbitration, as in 802.11, is perfectly compatible with PCP; any added delay in those systems is an indication of queueing - that the endpoints are sending faster than the underlying network can handle.

Eliminating queues caused by PCP endpoints is also easy. Whenever a queue is detected, all existing senders proportionately reduce their rate sufficiently to eliminate the queue over the next round trip. We call this process, rate compensation. Eliminating the queue over one round trip is more aggressive than is strictly required by control theory [32], but in our system, any queueing is an indication of resource contention. Under contention, proportionate decrease among existing senders, and uniform competition for newly available bandwidth among all senders, helps achieve eventual fairness.

We use two mechanisms to detect over-committed network links. First, we monitor the gap between baseline PCP packets as they enter and exit a network path. If the time gap observed at the receiver ($\Delta_{out}$) is greater than the spacing $\Delta_{in}$ used by the sender, the bottleneck link is likely to be overloaded. To avoid making matters worse, we reduce the base sending rate by a factor of $(\Delta_{out} -
\Delta_{in})/\Delta_{out}$. Second, in order to drain the queue, we monitor the one-way delays experienced by PCP packets. If the maximum one-way delay ( max-delay) observed in the previous round trip time is greater than the minimum observed one-way delay to the destination ( min-delay), then there is persistent queueing at the bottleneck link. To eliminate the queue build-up, we reduce the sending rate by a factor of (max-delay - min-delay)/max-delay. In both cases, we bound the proportionate decrease to the TCP backoff rate - no more than half of the prior rate during any round trip. (We concern ourselves only with the measurements during the previous round trip as prior rate compensation is presumed to have eliminated the overloads during prior round trips.) If all senders detect queueing and reduce their rate by this proportion, and no further probes are launched, it is easy to show that the queue will disappear within one round trip time. Senders with much shorter round trip times will reduce their rate more quickly, shouldering more of the burden of keeping queues small, but they will also be able to acquire bandwidth more quickly by probing for new bandwidth at a faster rate.

Once the base rate is reduced, probes may successfully re-acquire the bandwidth. These probes may be launched either by other nodes, or even by the reducing node itself. This is done to foster additive increase, multiplicative decrease behavior when there is contention. If the queues do not dissipate, the existing senders will continue to proportionally decrease their rates. Some combination of flows will acquire the released bandwidth.

A detail is that we apply the rate compensation incrementally, after every acknowledged packet, by comparing the required rate compensation to the rate reductions that have already been applied over the previous round trip time. If the new rate compensation is larger, we reduce the sending rate by the difference. This is similar to a single rate adjustment made once per round trip time, but operates at a finer granularity. Further, to reduce the impact of the noise on the system, we discard outliers represented by either the lowest 10% or the highest 10% of the measured values.

Another detail is that Internet routing changes can transparently alter the baseline one-way packet delay. Although some have called for providing endpoints the ability to detect when their packets have been re-routed [36], that facility is not available on the Internet today. There are two cases. If the routing change decreases the baseline delay, the node will update its min-delay, observe there is a difference between the new min-delay and the previous max-delay, and proceed to reduce its rate by at most one half. Behavior will then revert to normal after one round trip, and the end host will be free to probe to acquire bandwidth on the new path. If the routing change increases the baseline delay, the node will see an increase in its max-delay and likewise reduce its rate in an attempt to compensate. This reduction will dissipate after min-delay has timed out. Note that the probe process is independent of rate compensation; probe success is based on the measured increase in delay during the probe, and not on the long term estimation of the queue. Thus, as long as the new path has spare capacity, the end host will be able to quickly re-acquire its released bandwidth. Both types of routing changes result in temporarily reduced efficiency, but with the positive side effect that the affected endpoints are less aggressive exactly when the state of the network is in flux.


next up previous
Next: Probe control Up: PCP Design Previous: Emulating Request-and-Set
Arvind Krishnamurthy 2006-04-06