Design Goals

Table 1 outlines the design space for congestion control mechanisms. We argue in this section that PCP explores a previously unstudied quadrant of the design space - endpoint emulation of optimal router-based control. If we were to start from scratch, the design goals for a congestion control algorithm would be clear:

Table 1: Congestion Control Design Space

	Endpoint	Router Support
Try and Backoff	TCP [27], Vegas [10] RAP [45], FastTCP [31] Scalable TCP [34] HighSpeed TCP [19]	DecBit [44], ECN [18] RED [21], AQM [9]
Request and Set	PCP	ATM [4,46], XCP [32] WFQ [14], RCP [15]

Note that network-based congestion control can easily achieve all of these goals [4,46]. With ATM, for example, endpoints send a special rate control message into the network to request bandwidth, enabling the bottleneck switch or router to explicitly allocate its scarce capacity among the competing demands. We call this approach ``request and set'' because endpoints never send faster than the network has indicated. Response time is minimized because fair share is communicated in the minimum possible time, a round trip. This also results in queues being kept empty and bandwidth being allocated fairly. Our goal is to see if we can achieve all these properties without any special support from the network [47], by emulating ``request and set'' mechanics from endpoints.

By contrast, TCP congestion control achieves the last three goals [27] but not always the first two. TCP carefully coordinates how the sending rate is adjusted upwards and downwards in response to successful transmissions and congestion signals. We call this approach ``try and backoff'' since an end host sends traffic into the network without any evidence that the network has the capacity to accept it; only when there is a problem, that is, a packet loss, does the end host reduce its rate.

Since a TCP endpoint has no knowledge of the true available bandwidth, it initially starts small and through a series of steps called slow start, drives the network to saturation and packet loss, signaling the capacity limit of the network. Although effective, this process can waste bandwidth on startup - asymptotically $O(n \log n)$ in terms of the path's bandwidth delay product [12]. (To be fair, TCP congestion control was designed at a time when links were thin and usually fully utilized; in these situations the efficiency loss of slow start is minimal.) Further, TCP's slow start inefficiency is fundamental. Several proposals have been made for methods to jump start the initial window size, but they run the risk of causing increased packet losses in situations where there is persistent congestion [19].

Once a TCP endpoint determines the available bandwidth, in theory the link will be fully utilized, amortizing the initial inefficiency for a sufficiently long connection. Of course, many flows are short. Even for long flows, TCP steady state behavior can be disrupted by the bursty traffic pattern emitted by other flows entering and exiting slow start. In practice, TCP may achieve only a fraction of the available bandwidth, because of the need to slowly increase its sending rate after a loss event to avoid persistent congestion [30,32]. Similar problems occur with TCP in the presence of noise-induced loss, such as with wireless links [5].

Some researchers have studied how to modify end hosts to improve TCP performance, while keeping its basic approach. For example, TCP Vegas [10], FastTCP [31], Scalable TCP [34], and HighSpeed TCP [19] all attempt to improve TCP steady state dynamics. Vegas and FastTCP are similar to PCP in that they use packet delay to guide congestion response, but unlike PCP, they do so only after sending at the target rate for the entire round trip time. And because all of these alternate approaches leave slow start unchanged, they only help the small fraction of transfers that reach steady state.

Other researchers have explored adding TCP-specific support to routers. For example, routers can drop packets before it becomes absolutely necessary [21,9], as a way of signalling to end hosts to reduce their sending rates. However, this trades increased packet losses in some cases for lower delay in others; most users want both low loss and low delay. Some have advocated setting a bit in a packet header to signal congestion back to the sender [44,18], but this does nothing to address the slow start inefficiency. This has led some to advocate adding an ATM-style rate control message to the Internet to allow for more rapid TCP startup [42,15]. And so forth.

Our approach is to explore the opposite quadrant in Table 1. Is it possible to achieve all five goals using only endpoint control, by emulating the ``request and set'' semantics of explicit router-based resource allocation? In doing so, we hope to design a system that is better suited to the tradeoffs we face today vs. what TCP faced fifteen years ago. In designing our system, note that we place the first two goals listed above ahead of the last three, in priority. While we want our system to to be efficient, stable and fair under high load, we also want our system to behave well in the common case.

The common case is that most network paths are idle most of the time, and are becoming more so over time [41,2]. This was not always true! Rather, it is the natural consequence of the cumulative exponential improvement in the cost-performance of network links--at least in industrialized countries, it no longer makes sense for humans to wait for networks. By contrast, when TCP congestion control was initially designed, wide area network bandwidth cost over one thousand times more than it does today; at that price, fairness would naturally be more important than improving connection transfer time. The opposite holds today.

Second, even with HTTP persistent connections, most Internet transfers never reach TCP steady state [22,48]. Even when they do, startup effects often dominate performance [12]. For example, a 1MB cross-country transfer over fast Ethernet can achieve an effective throughput with TCP of only a few Mbps, even with no other flows sharing the path. Home users are more frequently bandwidth-limited, but even here, TCP is not well-suited to a highly predictable environment with little multiplexing.

Third, computation and memory are becoming cheaper even faster than wide area network bandwidth. TCP was originally designed to avoid putting a multiplication operation in the packet handler [27], yet at current wide area bandwidth prices, it costs (in dollars) the same amount to send a TCP ack packet as to execute half a million CPU instructions [25]. One consequence is that hardware at aggregation points is increasingly limited by TCP mechanics: to remain TCP ``friendly'' the aggregation point must not send any faster than k parallel TCP connections [6], something that is only efficient if there are multiple active flows to the same destination. By contrast, PCP can benefit directly from an endpoint's excess cycles and memory by modeling the likely behavior of a network path, even if the path has not been recently used.

Finally, our approach integrates better with constant rate real-time traffic. Without router support, TCP's continual attempts to overdrive and backoff the bottleneck link can disrupt fixed-rate flows that are sharing the link, by introducing packet delay jitter and loss. To some extent, this problem can be reduced by sophisticated active queue management (AQM) [24]. Lacking widespread deployment of AQM systems, most ISP's today have abandoned the vision of integrated services--they provision logically separate networks to carry voice over IP as distinct from regular web traffic--as the only practical way to achieve quality of service goals. By contrast, in PCP, best effort traffic will normally have very little impact on background fixed-rate traffic, again without any special hardware support.

To sum, TCP is optimized for the wrong case. Like TCP, the design we describe in this paper provides robust congestion control for a wide range of operating conditions. But equally importantly, our approach is better than TCP for the common case: moderate-sized transfers over mostly idle links with near-zero loss and low delay even when there is congestion. Of course, network-based congestion control solutions have already been shown to provide these characteristics; our point is simply to demonstrate that we can achieve these goals without network support.