################################################ # # # ## ## ###### ####### ## ## ## ## ## # # ## ## ## ## ## ### ## ## ## ## # # ## ## ## ## #### ## ## ## ## # # ## ## ###### ###### ## ## ## ## ### # # ## ## ## ## ## #### ## ## ## # # ## ## ## ## ## ## ### ## ## ## # # ####### ###### ####### ## ## ## ## ## # # # ################################################ The following paper was originally published in the Proceedings of the USENIX Mobile & Location-Independent Computing Symposium Cambridge, Massachusetts, August 2-3, 1993 For more information about USENIX Association contact: 1. Phone: 510 528-8649 2. FAX: 510 548-5738 3. Email: office@usenix.org 4. WWW URL: https://www.usenix.org Experiences with X in a Wireless Environment Christopher Kent Kantarjiev Alan Demers Ron Frederick Robert T. Krivacic Mark Weiser Xerox Palo Alto Research Center 3333 Coyote Hill Road Palo Alto, CA 94304 ABSTRACT Wireless computing is all the rage; the X Window System seems to be inescapable. We have been experimenting with the cross-product, and have had mixed results. The network may not be the computer any more, but it certainly influences the way the computer performs, and adding a fairly atypical network layer cannot help but expose some underlying assumptions. We discuss a few that we found and go on to speculate about how to best push X in the direction of mobile and location-independent computing. Introduction For the past few years, a number of researchers from the Computer Science Laboratory at Xerox PARC have been investigating the foundational glue for a possible next generation of computing in which each person is continually interacting with hundreds of nearby wirelessly interconnected computers. This is known as Ubiquitous Computing or ubicomp[Weiser91]. This vision contends that computational devices will disappear into the background, just as the electric motor has disappeared into the background of everyday life. We believe that these devices, potentially numbering in the hundreds per person, will be mobile, location aware, and in communication with their environment. Thus, they can change their computation based on their current location, environment, and external events. In order to experiment with these ideas, a series of hardware devices have been built at Xerox PARC. We segregate devices into three categories by size: inch, foot and yard. Inch sized devices range from 6 inches and smaller, including devices such as the personal display assistants that are coming to market as well as those embedded in furniture and light switches. Foot sized devices are between 6 and 18 inches. Yard sized devices can range from 18 inches to 6 feet. It is key to the vision that applications interact and flow across the various classes of devices. One way to facilitate this is to provide a common graphical/user interface environment. Even though we don't believe it is ultimately the correct solution, we have chosen to experiment with the X Window System [Scheifler92] as this environment, both because it is a standard and because we are familiar with it. We believe (or believed) that being able to use X in this way allows us to develop applications quickly, rather than paying the starting cost of designing something from scratch. This paper is a chronicle of our efforts to date, and a summary of the lessons we have learned. We begin with a description of the hardware, to set the stage. A fairly detailed discussion of the network layer follows, because our network environment behaves rather differently than the Ethernet for which the X protocol was designed. We follow this with a discussion of the various server implementation strategies we have tried, and some thoughts about what sorts of applications one wants to build in the ubicomp environment, and how to best use X to build them. Hardware environment PARC has developed devices in all three categories. The Liveboard is a 1120x780, 60"x40" back-projected display with a wireless pen interface based on Sparcstation or IBM PC hardware, very useful for meetings and distributed collaboration [Elrod92]. A product version of the Liveboard is now available from a Xerox subsidiary. Our initial foot-sized platform was called the Scratchpad. This was a 1120x780 LCD display with a Scriptel tablet over the display, connected to a Sparcstation SBus. It allowed us to explore a device that exhibited the power of a workstation while using the tablet computer metaphor. Our initial mobile radio platform was called the XPad. This had both a 34010 display processor and a 68302 communications processor, 4MB of ram, a 1024x768 monochrome display, and a Scriptel tablet over the display. Communication was via a 900MHz FM radio of our own design. The XPad was large, heavy, difficult to program, and there had been significant delivery problems with 1024x768 displays, even within Japan. We built three XPads, but none has become operational for more than small demonstrations. We have now designed a new mobile radio platform we call the MPad (M for Mini). This contains a 640x480 backlit display with Scriptel tablet, a 68302 processor at a higher clockrate on an improved bus, 4MB of memory expandable to 16MB, sound in and out, and possibly other multi-media capabilities. The MPad uses a 5MHz near-field effect radio, providing cells that are approximate 6 meters in diameter. The first prototype MPad sent its first radio packet on December 20, 1991. We have built 15 MPads and deployed them in meeting areas and offices. The ParcTab consists of a 64128 monochrome display with pressure sensitive surface and three external buttons in a palm-fitting shape, communicating at 19,200 baud to an infrared ceiling system developed by Olivetti Research Ltd. and proposed by them as a standard for low-speed infrared communication. Inside the tab is a 8051-compatible processor, 16k EPROM, 8k NVRAM, a speaker, and an I2C bus for peripherals. We have built approximately 50 tabs and are deploying them to members of CSL. Our experience with Tabs is described in two companion papers [Adams93a, Adams93b]. In this paper, we concentrate on our experiences with the MPad system, since the Tabs do not run X and the Liveboards act largely as workstations. Network environment Physical layer MPads communicate on a wireless local area network consisting of extremely small "nanocells" about six meters in diameter. It is intended to support in-building use of hand-held wireless computing devices. It uses inexpensive, low-power, low-frequency near-field radios of our own design, which are considerably different from the 900MHz spread-spectrum technology being investigated elsewhere. The radios operate at 256 Kb/sec, with a range of 3 to 4 meters. Because they are near-field devices, power falls off very rapidly with distance, giving us very sharply-defined cell boundaries. The low operating frequency eliminates multipath effects. The relatively low data rate, is offset to some extent by the extremely small cell size, which ensures that the number of mobile stations sharing any given cell is low; in other words, the bandwidth density is of our system is comparatively high. Finally, the small cells make it possible to use routing information to provide applications with reasonably precise location sensing. Basic Media Access Protocol (MACA) In a wireless LAN, carrier and collision are not global properties as they are in a traditional broadcast LAN such as Ethernet. Because stations are physically separate and have limited range, the existence of carrier or a collision can be defined only with respect to an intended receiver. Thus, carrier sense and collision detection at the transmitting station don't really make sense, and we needn't be too embarrassed that our hardware doesn't provide them. The stations work around the absence of carrier sense and collision detection by exchanging a pair of (short) control packets before sending a (long) data packet. The control packets, RTS(n) ("Request To Send n bytes") and CTS(n) ("Clear To Send n bytes") are vulnerable to collision. After a successful exchange of control packets, with high probability all relevant stations know that a data packet is about to be sent and avoid colliding with it. This strategy is also used in Apple Localtalk protocols [Sidhu90]. To our knowledge it was first suggested for wireless media access control by Karn [Karn90], where it is called MACA for "Media Access with Collision Avoidance." Stations not directly participating in the exchange (called "bystanders" below) must refrain from sending in order not to disrupt the exchange. They do so according to the following defer rules: D1: A bystander hearing an RTS(n) packet must defer for the time required to send the expected CTS(n) packet. D2: A bystander hearing a CTS(n) packet must defer for the time required to send the expected n-byte DATA packet. Intuitively, these rules are justified as follows. For each pair of stations , we make the "reciprocity assumption" that A can hear transmissions from B iff B can hear transmissions from A; i.e., there is a symmetric notion of A being in range of B. It follows that packets sent by A can collide with packets being received by B iff A is in range of B iff A can hear transmissions from B. Thus, if A hears an RTS sent by B, A should defer to avoid colliding with the expected CTS at B (rule D1). Similarly, if A hears a CTS(n) sent by B, A should defer to avoid colliding with the expected n-byte DATA packet at B (rule D2). However, if A hears an RTS sent from B to C but does not hear an answering CTS, then with high probability C either has failed to respond or is out of range of A. In either case A may legitimately send packets without disrupting communication between B and C; thus, there is no requirement that a bystander hearing an RTS(n) packet defer for the expected DATA packet. In our environment the reciprocity assumption holds "most of the time," but it occasionally fails. For example, a nearby source of low-level noise can completely disable a station's receiver while having little effect on its transmitter. We don't yet know how important this effect will be in practice. Stations contend for the medium by sending RTS packets until one of them successfully elicits a CTS response. Given the above defer rules, is easy to see that nearly all collisions occur between RTS packets or between an RTS and a CTS packet; a successful CTS indicates the medium has been acquired. We use a fairly standard exponential backoff strategy. Each station maintains a backoff parameter b. A station wishing to send data chooses a number w uniformly at random in [1..b]. After waiting w contention slots (control packet times), if no activity has been seen the station sends an RTS and looks for a CTS in reply. If a CTS is received, the backoff parameter is reduced by an additive term beta beta := MAX(beta - b, MINBACKOFF) and the data is sent. If no CTS is received, a collision is assumed to have occurred. The backoff parameter is increased by a multiplicative factor alpha beta := MIN(beta x alpha, MAXBACKOFF) and contention is retried. Fairness In practice we cannot assume every mobile station will hear every successful exchange to adjust its backoff parameter; thus, the stations' backoff parameters will not track one another precisely. This can lead to asymmetric situations in which a successful sender gets to reduce its backoff and contend more aggressively than other stations, allowing it to dominate the channel for long periods. Although this behavior keeps channel utilization high, it is particularly undesirable for our system, in which most communication involves interactive use of hand-held devices. Our solution to this problem is to reserve a byte in the header of each MACA packet for the current backoff value. Whenever a packet is received, the receiving station adopts the backoff value from that packet. Since (by definition) every station in a cell is within range of the cell base station, and (in our current architecture) all communication involves the cell base station, it follows that the most recently successful backoff value is used with high probability by every station in the cell, avoiding the unfair behavior described above. This has been verified by simulation (up to 100 stations per cell) and experiment (5 stations per cell). The technique of copying backoff values achieves fairness only in the sense that every station wishing to acquire the channel has an equal chance of doing so. This definition of fairness is clearly inadequate for most cellular networks, in which about half the traffic in each cell originates at a single site (the cell's base station). In our current architecture, virtually all traffic comprises a single bidirectional TCP stream per portable machine, connecting the halves of the X server. To deal with this problem, our MACA code maintains a separate output queue for each destination, and behaves as if it were runninging a separate MACA output process for each destination. If one of the simulated output processes is successful, it governs the behavior of the real station. If two simulated output processes collide, and no activity is heard before the time of the simulated collision, a JAM packet is sent. Receiving stations interpret a JAM packet as a collision, so that all stations' backoff parameters are updated consistently. MACA acknowledgements We were initially concerned that our radios would have a high error rate. Although this has not turned out to be as serious as was feared, it is true that the most likely place for a packet to be dropped is the wireless link [Richley93]. Moreover, the protection afforded to a DATA packet by the preceding RTS-CTS exchange is considerably reduced if there are multiple, overlapping cells. For these reasons we took the obvious step of adding acknowledgements to MACA. Now data is sent between stations A and B by a 4-way exchange: 1. Station A sends RTS(n) to station B, indicating its desire to send n bytes of data. 2. Station B replies CTS(n), indicating its willingness to accept the data. 3. Station A sends n bytes of data. 4. Station B sends ACK. Of course the use of ACK packets requires sequence numbers. It also has a fairly subtle interaction with the defer rules. We have introduced the obvious defer rule for DATA packets: D3: A bystander hearing a DATA packet must defer for the expected ACK packet. This rule does not protect ACK packets to the extent that rules D1 and D2 protect CTS and DATA packets. To see this, consider three stations A, B and C, where A is within range of B and B is within range of C, but A and C are out of range. Suppose B sends an RTS to C (which is also heard by A) and B receives an answering CTS (not heard by A). While B sends its DATA, A may legitimately engage in communication with some other station; thus, a packet sent by A can collide with the ACK eventually sent by C to B. Fortunately, such a collision is not especially costly. After a link-layer timeout (on the order of a single contention slot), B resends the RTS. Since the data has already been transferred successfully, C responds with an ACK, which (like a CTS) is protected by rule D1. Thus, while some additional contention is introduced, the data is not resent, and the extra overhead is independent of the size of the data packet. Our MACA implementation provides a callback interface, which a sending client can use to learn when a queued packet has been delivered over the radio and acknowledged (or discarded due to excessive retries). As discusseed below, this information is used in two ways: it enables our TCP header compression algorithm to maintain consistent per-connection state, and it enables our TCP implementation to coalesce short data packets in many cases. Unsolicited Data and TCP Header Compression Our X server sends pen events over the radio. This means that much of our radio traffic is very small data packets. This makes the overhead of a 4-way exchange particularly high, and suggests the optimistic strategy of sending (and acknowledging) unsolicited data packets if the packets are sufficiently short [Karn90]. The potential gain of sending unsolicited DATA rather than an RTS is the elimination of the RTS/CTS exchange. Unfortunately (but not surprisingly), if the DATA packet is significantly longer than a single contention slot this potential gain is completely offset by the increased collision cross section of the unsolicited DATA packet compared to an RTS; a DATA packet containing a 40-byte TCP+IP header is too long to be sent this way. We have solved this problem by implementing a version of Van Jacobson's TCP header compression algorithm[Jacobson90] for MACA. This algorithm typically compresses TCP+IP headers to 3 or 4 bytes, resulting in a total of about 20 bytes of data in the MACA packet associated with a pen event. The measurements below show that these packets are small enough to benefit from being sent as unsolicited data. As originally described, Jacobson's header compression requires the link layer to report receive errors to the decompressor, which then enters "toss mode" until receiving an undamaged packet containing an explicit connection number. Because collisions, which happen regularly, look to the hardware like receive errors, the MACA layer is unable to provide this information reliably. Fortunately, we can rely on the ackowledgements provided by MACA to ensure that the compressor is notified when a compressed packet may not have been delivered. It is straightforward to modifiy the algorithm so that connection state is discarded at the compressor end (rather than the decompressor end) in this case. Coalescing TCP writes into larger packets Even with optimizations such as TCP header compression and MACA 2-way exchange, the packet overhead for a small packet is quite large. This is a particular problem for traffic such as pen events, since they tend to be short (16 bytes) and spaced out in time (typically 50 events/second). To get adequate user response, it is important to keep the delay on delivering pen events as small as possible. However, delivering each in its own TCP packet uses up an incredible amount of channel bandwidth. One technique for avoiding small data packets was suggested by Nagle [Nagle84]. This scheme avoids sending less than a full-sized TCP segment whenever there's already some unacknowledged data outstand- ing. Unfortunately, it does not work very well for a low bandwidth continuous data stream such as pen events, as it adds roughly a full round trip worth of delay to all the events and this seriously degrades responsiveness. Instead of using Nagle's technique, our system takes advantage of the callback mechanism provided by the MACA driver. It leaves all the data queued at the TCP layer for as long as possible, creating a new packet and handing it to MACA only when the last packet has been sent on its way. Holding onto the data at the TCP layer does not introduce any extra delay, as packets are made available as fast as MACA can deliver them. In fact, since the MACA queue is kept small, it reduces the average delay. In addition, the larger packets which result improve overall throughput. Network performance Basic TCP throughput Using 4-way exchanges and no header compression, and sending 512-byte data packets, we achieve a TCP throughput of 117.6 kbps. This represents a little less than half the bandwidth of our 256 kbps radio link. To put this number in perspective, we note that our MPad MACA implementation requires approximately 4 ms for an RTS-CTS exchange, of which about 1.5 ms is accounted for by packet transmission times. The time required for a 0-byte DATA-ACK exchange plus a random backoff computation (in a lightly-occupied cell with the backoff parameter at its minimum value) is approximately 6 ms, again including 1.5 ms packet transmission times. Thus, the 4-way exchange adds 10 ms (320 byte times at 256 kbps) to the cost of sending a data packet. Using 512-byte packets we get an acknowledgement for every second TCP packet. Thus, sending 1024 bytes of data requires three 4-way exchanges (two data packets and an ACK) and sends three 40-byte TCP-IP headers. The total overhead is 960 byte times for the three 4-way exchanges, plus 120 bytes for the three TCP-IP headers, or a total of 1080 byte times of overhead to send 1024 bytes of data. The effect of the fixed 10 ms overhead is proportionally greater as the packet size goes down: again using 4-way exchanges and no header compression, but sending 16-byte data packets (to simulate pen events), we achieve a TCP throughput of only 10.2 kbps. Header compression Adding TCP header compression and 2-way exchanges of unsolicited DATA packets as described above yields worthwhile performance improvement for short packets. Using 512-byte data packets, so that TCP ack packets can be sent by 2-way exchange but TCP data packets require a full 4-way exchange, our measured TCP throughput is 121.6 kbps, an improvement of only 3.4% over the 117.6 kbps figure obtained without compression. However, using 16-byte data packets, so that TCP data and ack packets can both be sent by 2-way exchange, our measured TCP throughput is 14.0 kbps, an improvement of 37% over the 10.2 kbps figure obtained without compression. Fair bandwidth division To show that MACA provides fair distribution of bandwidth among multiple stations, we ran repeated experiments in which 4 MPads sent data as fast as possible to the same base station using 512-byte TCP data packets with compression and 2-way exchange enabled. Data rates achieved by individual MPads ranged from a low of 28 kbps to a high of 37 kbps. Interestingly, there was no performance penalty as a result of sharing the channel - in all experiments the aggregate throughput was around 125 kbps, actually slightly higher than the maximum throughput achievable by a single MPad sending alone. TCP write coalescing To test the performance of the TCP write coalescence algorithm, we created an artifical source of "pen traffic." It generated 16 byte chunks of data on a TCP stream at a rate of 50 events/second. The resulting stream needed 800 bytes/second, or 6.4 kbps of bandwidth. Based on the above numbers for 16 byte data packets, a stream such as this would use up roughly half of the channel if no data was coalesced. However, with coalescence, four pads in a single cell were able to simultaneously get exactly the 6.4 kbps they asked for. Standard X traffic would involve not only pen events on the channel, but also some set of traffic in the reverse direction with drawing requests. While this reverse traffic is typically blocked into fairly large packets, the bandwidth available for it can be greatly reduced by the pen traffic. The available bandwidth for such a reverse stream without coalescence was only 53.1 kbps, or 43.7% of the bandwidth available on an otherwise idle channel. With coalescence, the available bandwidth rose to 85.6 kbps. X servers for non-wired devices For the purposes of discussing our X server work, there are two classes of devices - wired and non-wired. The distinction is really driven by the available bandwidth and latency of the connection of the device and the rest of the computing infrastructure. The liveboard and scratchpad are driven by fairly straightforward ports of the MIT sample server. To the software, the liveboard looks like a monochrome frame buffer with a slightly odd input device - we have made small changes to simultaneously support the standard mouse as well as the liveboard pen. There is one liveboard that has provisions for multiple pens, and we have used the X Input Extension to support these, but have little experience with applications that can actually make use of them. Similarly, the scratchpad is just another monochrome frame buffer with a one button input device (the pen). The various incarnations of Pads and the Tabs have been more challenging. The pad radios have all had a basic data rate of 256 kbps, of which we expected to get 120 kbps by the time we add in the overhead of media access, IP and TCP. Practical experience with X over leased 56kbps lines led us to believe that this is adequate, and our experience has shown this to be so, with the possible exception of response time for stylus input events. Porting decisions We wanted to get the hardware deployed quickly in order to allow developers to begin experimenting with new code. Thus, we chose a porting strategy that got applications running against the Pads quickly, but was not intended to give optimal performance. The first such attempt was a simple bitmap difference program. This program runs on a workstation, opens the frame buffer memory, and continually reads the upper left hand section that corresponds to the size of the pad display. Each successive "frame" is bitwise compared to the previous one, and the differences are compressed and sent to the pad across the radio. Input events from a mouse attached to the pad are fed back into the server by the same program warping the workstation's cursor and using SendEvent to synthesize button presses. There were two big problems with this approach, both having to do with the latency involved. The most obvious was that cursor tracking in this scheme was almost impossible. The latency of the path from moving the mouse to getting the feedback of the cursor having moved was almost half a second, which was most disturbing and essentially useless. The second was that the bitmap differences could get as far as 10 seconds behind, exacerbated by media contention with traffic from subsequent cursor movement and updates. Another problem was that the constant polling of the frame buffer memory was a load on the system. So the first lesson that we learned is to implement cursor tracking as close as possible to the input device, something that many hardware vendors still haven't learned, since they are using the "software cursor" code from the sample server. Our next choice was to do a partial port of the X server. We didn't wish to port completely to the pad, for two reasons: the MIT sample server is strongly dependent on a fairly heavyweight operating system environment (in particular, it counts on having a file system to store much of its information), and we did not wish to route inter-client communication that uses the X server as a rendezvous point over the slow radio. The sample server is made up of several "layers" in an attempt to provide the server implementer or porter great flexibility in trading off implementation effort for ultimate performance [Angebranndt88, Angebranndt90]. The lowest, device-dependent layer ("ddx") can be implemented with highly tuned code that has extremely intimate knowledge of the device. Or it can be implemented with the supplied very simple code ("mi") that counts on little about the device except how to paint or fetch a horizontal line of pixels from the display. This set of minimal routines is called the Pixblt layer. We devised a "split server" architecture, where the Pixblt routines run on the pad, and the rest runs on a workstation, with a network connection between them. This Pixblt layer, consisting only of routines to handle the input device and draw and fetch one-bit high bitmaps, is very straightforward to implement, and the rest of the server allows a quick and dirty port by implementing just these simple routines. We built this first with two simulations - a network throughput simulator and a pad simulator. The throughput simulator allowed us to control the available network bandwidth to simulate behavior over the radio, and the pad simulator allowed the back-end pad code to be debugged on a workstation without worrying about infant hardware environment problems. When this was all working, we moved the code to the actual hardware in short order. Unfortunately, the throughput simulator simulated a slow, but perfect, TCP link. As detailed in the previous section, we discovered that building a reliable radio device and media access protocol is not a straightforward task. Not only is reliability of transmission a problem, the effects of small packet sizes on available channel capacity was unforeseen. This allowed us to get some of our applications up and running, albeit slowly. The first thing that we noticed was how slow everything seemed. We had to re-learn the lesson about cursor tracking from the Xpad, and quickly implemented tracking of the cursor and changing its shape on the pad side. We then began to analyze what protocol was being sent and took steps to improve perceived performance, both by tuning our code and adding to the protocol. Rectangles are the most important There are three routines in the Pixblt layer: FillSpans, which fills a set of horizontal, one-bit high bitmaps (spans) according to the current fill rule (a solid color or a pattern tile); SetSpans, which fills a set of spans from a source bitmap, and GetSpans, which reads a set of spans out of the frame buffer. In order to make the code as simple as possible, FillSpans was implemented without requiring any state on the pad side. This meant that the pattern tile, if any, had to be transmitted with each request. We quickly learned what a horrible decision this was - most FillSpans requests that were user feedback (as opposed to those involved in displaying a window for the first time) involved very small spans, and time spent sending the tile dominated the time required to satisfy the painting request. By adding a small cache of pattern tiles on the pad side and a few protocol requests to manage it, we were able to paint simple 1010 rectangles over ten times as fast. We learned this lesson again as we added a protocol requests to fill a rectangle, rather than a series of spans, from a tile. This allowed a rectangle to be described by four 16-bit integers, instead of two 16-bit integers and a 32-bit width per pixel of height. This change painted 1010 rectangles 60 times faster. Moreover, it provided a subjective shift in the "snappiness" of the system - when a window is first mapped, it is painted as a main rectangle with many smaller rectangles for borders and window manager-supplied decoration. These rectangles all were now painted much more quickly, giving the perception of much greater speed. We learned a similar lesson when we switched the code that implemented SetSpans from sending one request per span to batching a TCP window's worth of spans before sending them. This only doubled the measured copying speed, but the visual effect was much more striking: there was a longer pause while the data was received, but then painting proceeds as fast as the CPU can execute the inner loop. Each of these reductions in protocol overhead had a side benefit - the cursor tracking code removes the cursor from the screen just before processing a request and replaces it immediately after. Fewer protocol requests mean less time wasted repainting the cursor and even snappier response. We currently have complaints about two areas of performance: painting text, and copying between Pixmaps (bitmaps stored in the server) and the screen. Making fonts go faster requires adding font support to the pad, which means either adding network file system access to the pad's operating system, or porting the X font- server client code to the pad. A "quick hack" might be to download fonts on the fly, but this looks like as much work as doing it right. Pixmaps are stored on the workstation side of the split server, mostly for expediency of implementation. (Moving the pixmaps to the pad means that the server has to do management of the remote memory, and painting into pixmaps suddenly becomes slow because it has to go across the wire, or requires a lot of the drawing code to be moved to the pad, so that the painting requests can execute there.) As a result, displaying a pixmap on the screen results in a lot of SetSpans requests going over the wire, filled with bits to be painted. We have implemented G3 compression algorithms for the SetSpans code, to try to minimize the amount of data sent. This code is currently turned off. Most of the SetSpans requests send small enough amounts of data that the effort required to decompress the bits is more than the time saved by reducing the bandwidth. The protocol overhead of sending the request is the dominant factor. Next steps At this point, we determined that doing window copies local to the pad, and providing direct font support (text is currently painted using mi code that breaks the glyphs into spans and sends them) are next in importance. The code and protocol changes to implement these are fairly substantial, especially since the pad has no access to a file system (to get the fonts). As the design of this next phase proceeded, we came to the realization that we were implementing more and more of the sample server on the pad, in fairly undisciplined manner. Thus, we have decided to do a full port of the server to the pad, in order to measure the actual cost of the inter-client traffic that we worried about. We expect to be aided in this by the X Consortium's Low Bandwidth X (LBX) effort, which is attempting to address the bandwidth requirements of the core X protocol [Fulton93]. In particular, LBX intends to decrease the cost of the empty space in the X protocol encoding, and use caching techniques to reduce inter-client communication where possible. Performance of text-based applications is quite satisfactory at this point. But our target applications, which mostly involve inking of stylus input, are still rather slow. This is because the ink is being drawn by the client program, and it takes a fairly long time for the input event to reach the client program and the feedback request to reach the display. One solution to this problem is to provide an extension that does local inking [Rhyn91, Kempf93]. We have so far resisted this option, because it means a non-standard change to the server that we have to maintain and track across releases, both in the Mpad server and our workstation servers (so applications developers can work at their desktops, one of the original motivations for using X). We prefer to exhaust tuning opportunities at all other points in the system. Another solution is to build local clients - applications that run "native" on the pad and do not incur the overhead of radio communications before providing feedback. This is a large undertaking. Our operating system does not currently support any form of dynamic loading. Local clients must share screen real estate with remote clients: they must either be X clients, or we must provide a lower-level real estate manager that both local clients and the X server manipulate. Sharing, migration and applications Part of the ubicomp vision is that there are enough devices around that it no longer become necessary to carry them; wherever you go, there will be one that you can just pick up and start using, much as we treat pads of paper today. This implies that it should be easy to "whistle up" your state from display device to display device. Doing this with X is problematic at best. When a client program first contacts a display server, it must learn a fair bit about the display hardware and tailor itself to the display. In particular, an application will determine the density of pixels on the screen, how many significant bits of color information are available, allocate some pixels out of the display's colormap, and set up some pattern tiles. These resources are identified by 32-bit identifiers unique to a connection, and are used in future protocol requests. If the client program wishes to start drawing on a different X display, it must be prepared to recreate all this state and possibly alter its operation (if, for example, there are more or fewer bits of color display available). If the client wishes to draw on two or more displays at once, it must keep track of multiple sets of resource IDs and color models. Few existing clients are prepared to do this - if the connection goes away, the client exits. There are basically two options available to the user who wishes to migrate or share a client between several displays. The first is using a program known as a pseudoserver, which program sits between the application and the X server driving the display(s) [Gust88, Patterson90, Menges93]. It looks like a display server to the application, and a client to the display server. The pseudoserver may perform arbitrary transformations on the protocol requests, including remapping resource IDs and multicasting protocol requests. The second is to provide an underlying toolkit for the application which is prepared to manage multiple displays. The primary advantage of using pseudoservers is that it works with any unmodified X application. Since the pseudoserver is dealing directly with raw X protocol it need not know anything about what the application does or how it works. There are several disadvantages however. Because applications generally connect to a server for their lifetime, if the user ever wishes to share or migrate she must connect through the pseudoserver when first starting the application. If she decides she wants to share an application that was not started with a pseudoserver, there is no way to do it without restarting. If all applications are run with a pseudoserver then the cost of the pseudoserver is paid, even if nothing is ever shared. There is also a performance cost to using a pseudoserver because it introduces an extra communication hop in the X protocol, so roundtrip requests to the server require four context switches instead of two. X requires clients to have fairly detailed understanding of the display hardware's color model. Most clients do not make provisions for the color model to change during the connection lifetime. Thus a pseudoserver must choose carefully what color model it will provide to clients, and must be able to implement some reasonable facsimile of that color model on every display it encounters. A simple pseudoserver will force clients to run in monochrome mode, but this is not very satisfactory. An ambitious pseudoserver will offer 24-bit deep true color, and attempt to render it on whatever hardware it finds. This is very slow. Most pseudoservers offer 8-bit color and do the best they can. This is close to the approach taken by NeWS [Gosling], except that NeWS implements the mapping of the single abstraction to the hardware very close to the hardware. The pseudoserver must translate the X requests from the client in a fairly clumsy manner. Shared window servers handle about 90% of existing applications without too much trouble. The applications that cause trouble are those that understand a lot about the display that they are running on to do more "interesting" things. An example of an "interesting" application is one that allocates planes in the colormap in order to do animation by manipulating colors; a pseudoserver can not guarantee that it will implement this correctly on displays with different color capabilities or even on different displays with the same color capabilities but different colormap contents. A different class of problem is raised when sharing or migrating among displays with widely differing resolutions. An application that starts out with a display tailored for a 72 dpi display will probably not adapt correctly to a 300 dpi display, and a pseudoserver can only do so much in this case, because distances in the X protocol are expressed in terms of pixels, not a linear measure such as points, inches or millimeters. The second option is to write the client with a toolkit that hides the details of sharing and migration, and takes care of connection startup and shutdown, resource allocation and remapping, and so on. This toolkit will probably not expose as much detail about the hardware as the X protocol allows, and may save as much state about the client program as necessary in order to map the drawing model it provides onto the various display hardware it comes across. Dealing with these problems at the toolkit layer allows applications to look "right" on several different displays at the same time, since the application defines itself in terms of the model the toolkit provides, instead of the arbitrarily complex model available by using the full X protocol. All applications written with the toolkit gain the ability to share and migrate, and applications can be written which are aware of this and have special behavior. The applications behave in a consistent way and do not require extra effort to begin sharing. Performance is better since there is no pseudoserver involved. We believe that most of the existing applications are not appropriate for a shared environment, or even for a pen-based environment. So a 90% solution for the simpler applications that we wish to retain is adequate. For the new applications that we will build, we are building the ability to migrate into the underlying toolkit. We have built two such toolkits: one for the Cedar environment [Jacobi92], and one which invisibly adds sharing and migration to the Modula 3 Trestle toolkit [Manasse89]. Both of these toolkits provide a (different) abstract drawing model that allows clients to express an ideal display, which is then rendered as well as possible on the available hardware. There is a third approach to sharing state with X clients, and that is to introduce an application-specific protocol for exchanging data between clients that contact a single display. The Tivoli shared drawing program, developed at PARC [Pedersen93], and Van Jacobsen's wb program, developed at the Lawrence Berkeley Laboratories [Jacobson92] use this approach. Tivoli uses direct TCP connections between all the clients sharing a shared drawing surface (including clients that do not use X to display the surface), and wb uses multicast IP to communicate the history of a drawing session. Conclusions We are not at all convinced that using X for pen-based mobile/wireless computing is a good idea. The X protocol was designed to be operated on an Ethernet; it was expected that available bandwidth would be high and latency would be low and of low variance. Even on an Ethernet, application programmers must try to minimize the number of round trips from client to server. On a low speed link, this becomes even more of an issue. The X Consortium LBX effort addresses some of these problems, but it only directly deals with protocol encoding issues, not necessarily problems raised by unusual network fabrics. Treating the pads as X terminals forces us into a computing model which makes it expensive to provide snappy input response (there is a long delay between an input event and the client program's response to that event). We can address this by running clients local to the pad, but this means that the pad must support a fairly heavyweight computing environment, complete with access to network and file resources. We can address it by building an extension to the server that does local inking or local input response, but this means that we must support a non-standard body of code across changes to the X server. None of these solutions provides us with a way to give programs running entirely on the pad direct access to the pad's display; another modification to the X server is needed. We will continue down this path for a while longer, because the plus side is that the development environment is familiar to us, and allows us to deploy our applications quickly across all our platforms. Developing a new window system and deploying it is an expensive proposition, and one which we are not willing to enter into lightly. Acknowledgements Lawrence Butcher developed the networking basestations. Ed Richley designed and built our radios, and Melvin Chan keeps them running. Steve Deering made mobile IP work. Tom Rodriguez (a summer intern from Georgia Tech) resurrected JoinVBT in the distributed Trestle toolkit and made it work in X across hardware with different color models. References [Adams93a] Norman Adams, R. Gold, B.N. Schilit, M. Tso and R. Want. `An infrared network for mobile computers'. Proceedings USENIX Symposium on Mobile and Location-independent Computing, Cambridge Massachussetts.August 1993. [Adams93b] Norman Adams, R. Gold, B.N. Schilit, M. Tso and R. Want. The PARCTAB mobile computing system. Xerox Palo Alto Research Center. In preparation. [Angebranndt88] Susan Angebranndt, R. Drewry, T. Newman and P. Karlton. `Strategies for porting the X11 sample server'. Software Distribution Center, Massachusetts Institute of Technology, Cambridge, MA. 1988 [Angebranndt90] Susan Angebranndt, P. Karlton, R. Drewry and T. Newman. `Writing tailorable software', Software-Practice and Experience, 20:S2/109-S2/118, 1990. [Elrod92] Scott Elrod,R. Bruce, R. Gold et al. Liveboard: a large interactive display supporting group meetings, presentations and remote collaboration. Xerox Palo Alto Research Center. Report CSL-92-6, 1992. [Fulton93] Jim Fulton and C. Kantarjiev. `An update on low bandwidth X (LBX); a standard for X and serial lines'. Proceedings of the 7th annual X Technical Conference, January 1993. O'Reilly & Associates, 1993. [Gosling89] James Gosling, D. Rosenthal and M. Arden. The NeWS Book: an introduction to the networked extensible window system. Springer-Verlag, 1989. [Gust88] Phil Gust. `Shared X: X in a Distributed Group Work Environment'. Unpublished paper presented at the second annual X Technical Conference, January 1988. [Jacobi92] Christian Jacobi. `Migrating widgets'. Proceedings of the 6th annual X Technical Conference, January 1992. O'Reilly & Associates, 1992. [Jacobson90] Van Jacobson. `Compressing TCP/IP headers for low-speed serial links'. Arpanet Working Group Requests for Comment, DDN Network Information Center, RFC1144. February 1990. [Jacobson92] Van Jacobson. A portable, public domain, network whiteboard. Presentation at the Xerox Palo Alto Research Center, April 28, 1992. [Karn90] Phil Karn. `MACA - a new channel access method for packet radio'. Proceedings of the ARRL 9th Computer Networking Conference, London Ontario, Canada. September 22, 1990. [Kempf93] James Kempf and A. Wilson. `Supporting mobile, pen-based computing with X; mobile information for hi-tech nomads'. Proceedings of the 6th annual X Technical Conference, January, 1992. O'Reilly & Associates, 1992. [Manasse91] Mark S. Manasse and G. Nelson. Trestle reference manual. Digital Systems Research Center. Research Report 68, 1991. [Menges93] John Menges. `The X engine library: a C++ library for constructing X pseudoservers'. Proceedings of the 7th annual X Technical Conference, January 1993. O'Reilly & Associates, 1993. [Nagle84] John Nagle, `Congestion control in IP/TCP internetworks'. Arpanet Working Group Requests for Comment, DDN Network Information Center, RFC-896. January 1984. [Patterson90] John F. Patterson. `The good, the bad and the ugly of window sharing in X'. Unpublished paper presented at the fourth annual X Technical Conference, January 1990. [Pedersen93] Elin Ronby Pedersen, K. McCall, T. Moran and F. Halasz. `Tivoli: An electronic whiteboard for informal workgroup meetings'. Human Factors in Computing Systems: Proceedings of InterCHI 1993. Association for Computing Machinery, 1993. [Rhyn91] Jim Rhyne, D. Chow, M. Sacks. `The portable electronic notebook'. Unpublished paper presented at the fifth annual X Technical Conference, Cambridge, MA. January 16, 1991. [Richley93] Ed Richley and S. Elrod. Experiments with near-field radios. Xerox Palo Alto Research Center. In preparation. [Scheifler92] Robert W. Scheifler and J. Gettys. X Window System. Digital Press, Bedford, MA, 1992. [Sidhu90] Gursharan S. Sidhu, R. Andres, and A. Oppenheimer. Inside AppleTalk. Addison-Wesley, 1992. [Weiser91] Mark Weiser. `The computer for the 21st century'. Scientific American 265(3):93-104, September 1991. Author Information Christopher Kent Kantarjiev is a Member of the Research Staff in the Computer Science Lab at the Xerox Palo Alto Research Center (in Palo Alto, CA, natch). He has a B.S. in Physics from Xavier University (1979) and a Ph.D. in Computer Sciences from Purdue University (1986). He is a frequent contributor to the magazine of the Vintage Triumph Register. He can be reached as cak@parc.Xerox.COM. Alan Demers is a Principal Scientist in the Computer Science Lab at the Xerox Palo Alto Research Center. He has a B.S. in Physics from Boston College (1970) and a Ph.D. in Computer Science from Princeton Uni- versity (1975). He can be reached as demers@parc.Xerox.COM. Ron Frederick is a Member of the Research Staff in the Computer Science Lab at the Xerox Palo Alto Research Center. He has a B.S. in Computer and Systems Engineering from Rensselaer Polytechnic Institute (1989) and an M.S. in Computer Science from Stanford University (1991). He can be reached as frederick@parc.Xerox.COM. Robert T. Krivacic is a Member of the Research Staff in the Computer Science Lab at the Xerox Palo Alto Research Center. He has a B.S. in Computer Science from Bowling Green State University (1974) and a M.S. in Computer Science from The University of Colorado (1978). He can be reached as krivacic@parc.Xerox.COM. Mark Weiser is the head of the Computer Science Lab at the Xerox Palo Alto Research Center. He has a PhD in Computer and Communication Sciences from the University of Michigan (1979). His hobbies are existential philosophy, sociology of science, and writing computer games; his recent technical work has been on garbage collection and ubiquitous computing. He can be reached as weiser@parc.Xerox.COM.