################################################
	   #                                              #
	   # ##   ## ###### ####### ##    ## ## ##     ## #
	   # ##   ## ##  ## ##      ###   ## ##  ##   ##  #
	   # ##   ## ##     ##      ####  ## ##   ## ##   #
	   # ##   ## ###### ######  ## ## ## ##    ###    #
	   # ##   ##     ## ##      ##  #### ##   ## ##   #
	   # ##   ## ##  ## ##      ##   ### ##  ##   ##  #
	   # ####### ###### ####### ##    ## ## ##     ## #
	   #                                              #
	   ################################################


	 The following paper was originally published in the
       Proceedings of the USENIX Mobile & Location-Independent
			 Computing Symposium

	     Cambridge, Massachusetts,  August 2-3, 1993


	For more information about USENIX Association contact:

		   1. Phone:	510 528-8649
		   2. FAX:	510 548-5738
		   3. Email:	office@usenix.org
		   4. WWW URL:  https://www.usenix.org


	       Experiences with X in a Wireless Environment

			Christopher Kent Kantarjiev
				Alan Demers
			       Ron Frederick
			    Robert T. Krivacic
				Mark Weiser

		      Xerox Palo Alto Research Center
			   3333 Coyote Hill Road
			    Palo Alto, CA 94304

				 ABSTRACT

Wireless computing is all the rage; the X Window System seems to be
inescapable. We have been experimenting with the cross-product, and
have had mixed results. The network may not be the computer any more,
but it certainly influences the way the computer performs, and adding a
fairly atypical network layer cannot help but expose some underlying
assumptions. We discuss a few that we found and go on to speculate
about how to best push X in the direction of mobile and
location-independent computing.

Introduction

For the past few years, a number of researchers from the Computer
Science Laboratory at Xerox PARC have been investigating the
foundational glue for a possible next generation of computing in which
each person is continually interacting with hundreds of nearby
wirelessly interconnected computers. This is known as Ubiquitous
Computing or ubicomp[Weiser91]. This vision contends that computational
devices will disappear into the background, just as the electric
motor has disappeared into the background of everyday life.  We believe
that these devices, potentially numbering in the hundreds per person,
will be mobile, location aware, and in communication with their
environment. Thus, they can change their computation based on their
current location, environment, and external events.

In order to experiment with these ideas, a series of hardware devices
have been built at Xerox PARC. We segregate devices into three
categories by size: inch, foot and yard. Inch sized devices range from
6 inches and smaller, including devices such as the personal display
assistants that are coming to market as well as those embedded in
furniture and light switches. Foot sized devices are between 6 and 18
inches. Yard sized devices can range from 18 inches to 6 feet.

It is key to the vision that applications interact and flow across the
various classes of devices. One way to facilitate this is to provide a
common graphical/user interface environment. Even though we don't
believe it is ultimately the correct solution, we have chosen to
experiment with the X Window System [Scheifler92] as this environment,
both because it is a standard and because we are familiar with it.  We
believe (or believed) that being able to use X in this way allows us to
develop applications quickly, rather than paying the starting cost of
designing something from scratch. This paper is a chronicle of our
efforts to date, and a summary of the lessons we have learned.

We begin with a description of the hardware, to set the stage. A fairly
detailed discussion of the network layer follows, because our network
environment behaves rather differently than the Ethernet for which the
X protocol was designed. We follow this with a discussion of the
various server implementation strategies we have tried, and some
thoughts about what sorts of applications one wants to build in the
ubicomp environment, and how to best use X to build them.

Hardware environment

PARC has developed devices in all three categories. The Liveboard is a
1120x780, 60"x40" back-projected display with a wireless pen interface
based on Sparcstation or IBM PC hardware, very useful for meetings and
distributed collaboration [Elrod92]. A product version of the Liveboard
is now available from a Xerox subsidiary.

Our initial foot-sized platform was called the Scratchpad. This was a
1120x780 LCD display with a Scriptel tablet over the display, connected
to a Sparcstation SBus. It allowed us to explore a device that
exhibited the power of a workstation while using the tablet computer
metaphor.

Our initial mobile radio platform was called the XPad. This had both a
34010 display processor and a 68302 communications processor, 4MB of
ram, a 1024x768 monochrome display, and a Scriptel tablet over the
display. Communication was via a 900MHz FM radio of our own design. The
XPad was large, heavy, difficult to program, and there had been
significant delivery problems with 1024x768 displays, even within Japan.
We built three XPads, but none has become operational for more than
small demonstrations.

We have now designed a new mobile radio platform we call the MPad (M
for Mini). This contains a 640x480 backlit display with Scriptel tablet,
a 68302 processor at a higher clockrate on an improved bus, 4MB of
memory expandable to 16MB, sound in and out, and possibly other
multi-media capabilities. The MPad uses a 5MHz near-field effect radio,
providing cells that are approximate 6 meters in diameter. The first
prototype MPad sent its first radio packet on December 20, 1991. We
have built 15 MPads and deployed them in meeting areas and offices.

The ParcTab consists of a 64128 monochrome display with pressure
sensitive surface and three external buttons in a palm-fitting shape,
communicating at 19,200 baud to an infrared ceiling system developed by
Olivetti Research Ltd. and proposed by them as a standard for low-speed
infrared communication. Inside the tab is a 8051-compatible processor,
16k EPROM, 8k NVRAM, a speaker, and an I2C bus for peripherals. We have
built approximately 50 tabs and are deploying them to members of CSL.
Our experience with Tabs is described in two companion papers
[Adams93a, Adams93b].


In this paper, we concentrate on our experiences with the MPad system,
since the Tabs do not run X and the Liveboards act largely as
workstations.

Network environment

Physical layer

MPads communicate on a wireless local area network consisting of
extremely small "nanocells" about six meters in diameter. It is
intended to support in-building use of hand-held wireless computing
devices. It uses inexpensive, low-power, low-frequency near-field
radios of our own design, which are considerably different from the
900MHz spread-spectrum technology being investigated elsewhere. The
radios operate at 256 Kb/sec, with a range of 3 to 4 meters. Because
they are near-field devices, power falls off very rapidly with
distance, giving us very sharply-defined cell boundaries. The low
operating frequency eliminates multipath effects. The relatively low
data rate, is offset to some extent by the extremely small cell size,
which ensures that the number of mobile stations sharing any given cell
is low; in other words, the bandwidth density is of our system is
comparatively high. Finally, the small cells make it possible to use
routing information to provide applications with reasonably precise
location sensing.

Basic Media Access Protocol (MACA)

In a wireless LAN, carrier and collision are not global properties as
they are in a traditional broadcast LAN such as Ethernet. Because
stations are physically separate and have limited range, the existence
of carrier or a collision can be defined only with respect to an
intended receiver. Thus, carrier sense and collision detection at the
transmitting station don't really make sense, and we needn't be too
embarrassed that our hardware doesn't provide them.

The stations work around the absence of carrier sense and collision
detection by exchanging a pair of (short) control packets before
sending a (long) data packet. The control packets, RTS(n) ("Request To
Send n bytes") and CTS(n) ("Clear To Send n bytes") are vulnerable to
collision. After a successful exchange of control packets, with high
probability all relevant stations know that a data packet is about to
be sent and avoid colliding with it. This strategy is also used in
Apple Localtalk protocols [Sidhu90]. To our knowledge it was first
suggested for wireless media access control by Karn [Karn90], where it
is called MACA for "Media Access with Collision Avoidance."

Stations not directly participating in the exchange (called
"bystanders" below) must refrain from sending in order not to disrupt
the exchange. They do so according to the following defer rules:

D1: A bystander hearing an RTS(n) packet must defer for the time
required to send the expected CTS(n) packet.

D2: A bystander hearing a CTS(n) packet must defer for the time
required to send the expected n-byte DATA packet.

Intuitively, these rules are justified as follows. For each pair of
stations <A, B>, we make the "reciprocity assumption" that A can hear
transmissions from B iff B can hear transmissions from A; i.e., there
is a symmetric notion of A being in range of B. It follows that
packets sent by A can collide with packets being received by B iff A is
in range of B iff A can hear transmissions from B. Thus, if A hears an
RTS sent by B, A should defer to avoid colliding with the expected CTS
at B (rule D1). Similarly, if A hears a CTS(n) sent by B, A should
defer to avoid colliding with the expected n-byte DATA packet at B
(rule D2). However, if A hears an RTS sent from B to C but does not
hear an answering CTS, then with high probability C either has failed
to respond or is out of range of A. In either case A may legitimately
send packets without disrupting communication between B and C; thus,
there is no requirement that a bystander hearing an RTS(n) packet defer
for the expected DATA packet.

In our environment the reciprocity assumption holds "most of the time,"
but it occasionally fails. For example, a nearby source of low-level
noise can completely disable a station's receiver while having little
effect on its transmitter. We don't yet know how important this effect
will be in practice.

Stations contend for the medium by sending RTS packets until one of
them successfully elicits a CTS response. Given the above defer rules,
is easy to see that nearly all collisions occur between RTS packets or
between an RTS and a CTS packet; a successful CTS indicates the medium
has been acquired.

We use a fairly standard exponential backoff strategy. Each station
maintains a backoff parameter b. A station wishing to send data
chooses a number w uniformly at random in [1..b]. After waiting w
contention slots (control packet times), if no activity has been seen
the station sends an RTS and looks for a CTS in reply. If a CTS is
received, the backoff parameter is reduced by an additive term beta


	beta := MAX(beta - b, MINBACKOFF)

and the data is sent. If no CTS is received, a collision is assumed to
have occurred. The backoff parameter is increased by a multiplicative
factor alpha

	beta := MIN(beta x alpha, MAXBACKOFF)

and contention is retried.

Fairness

In practice we cannot assume every mobile station will hear every
successful exchange to adjust its backoff parameter; thus, the
stations' backoff parameters will not track one another precisely. This
can lead to asymmetric situations in which a successful sender gets
to reduce its backoff and contend more aggressively than other
stations, allowing it to dominate the channel for long periods.
Although this behavior keeps channel utilization high, it is
particularly undesirable for our system, in which most communication
involves interactive use of hand-held devices.

Our solution to this problem is to reserve a byte in the header of each
MACA packet for the current backoff value. Whenever a packet is
received, the receiving station adopts the backoff value from that
packet. Since (by definition) every station in a cell is within range
of the cell base station, and (in our current architecture) all
communication involves the cell base station, it follows that the most
recently successful backoff value is used with high probability by
every station in the cell, avoiding the unfair behavior described
above. This has been verified by simulation (up to 100 stations per
cell) and experiment (5 stations per cell).

The technique of copying backoff values achieves fairness only in the
sense that every station wishing to acquire the channel has an equal
chance of doing so. This definition of fairness is clearly inadequate
for most cellular networks, in which about half the traffic in each
cell originates at a single site (the cell's base station). In our
current architecture, virtually all traffic comprises a single
bidirectional TCP stream per portable machine, connecting the halves
of the X server. To deal with this problem, our MACA code maintains a
separate output queue for each destination, and behaves as if it were
runninging a separate MACA output process for each destination. If one
of the simulated output processes is successful, it governs the
behavior of the real station. If two simulated output processes
collide, and no activity is heard before the time of the simulated
collision, a JAM packet is sent. Receiving stations interpret a JAM
packet as a collision, so that all stations' backoff parameters are
updated consistently.

MACA acknowledgements

We were initially concerned that our radios would have a high error
rate. Although this has not turned out to be as serious as was feared,
it is true that the most likely place for a packet to be dropped is the
wireless link [Richley93]. Moreover, the protection afforded to a DATA
packet by the preceding RTS-CTS exchange is considerably reduced if
there are multiple, overlapping cells. For these reasons we took the
obvious step of adding acknowledgements to MACA. Now data is sent
between stations A and B by a 4-way exchange:

1. Station A sends RTS(n) to station B, indicating its desire to send n
bytes of data.

2. Station B replies CTS(n), indicating its willingness to accept the
data.

3. Station A sends n bytes of data.

4. Station B sends ACK.

Of course the use of ACK packets requires sequence numbers. It also has
a fairly subtle interaction with the defer rules. We have introduced
the obvious defer rule for DATA packets:

D3: A bystander hearing a DATA packet must defer for the expected ACK
packet.

This rule does not protect ACK packets to the extent that rules D1 and
D2 protect CTS and DATA packets.  To see this, consider three stations
A, B and C, where A is within range of B and B is within range of C,
but A and C are out of range. Suppose B sends an RTS to C (which is
also heard by A) and B receives an answering CTS (not heard by A).
While B sends its DATA, A may legitimately engage in communication with
some other station; thus, a packet sent by A can collide with the ACK
eventually sent by C to B. Fortunately, such a collision is not
especially costly. After a link-layer timeout (on the order of a single
contention slot), B resends the RTS. Since the data has already been
transferred successfully, C responds with an ACK, which (like a CTS) is
protected by rule D1. Thus, while some additional contention is
introduced, the data is not resent, and the extra overhead is
independent of the size of the data packet.

Our MACA implementation provides a callback interface, which a sending
client can use to learn when a queued packet has been delivered over
the radio and acknowledged (or discarded due to excessive retries).  As
discusseed below, this information is used in two ways: it enables our
TCP header compression algorithm to maintain consistent
per-connection state, and it enables our TCP implementation to coalesce
short data packets in many cases.

Unsolicited Data and TCP Header Compression

Our X server sends pen events over the radio. This means that much of
our radio traffic is very small data packets. This makes the overhead
of a 4-way exchange particularly high, and suggests the optimistic
strategy of sending (and acknowledging) unsolicited data packets if
the packets are sufficiently short [Karn90]. The potential gain of
sending unsolicited DATA rather than an RTS is the elimination of the
RTS/CTS exchange. Unfortunately (but not surprisingly), if the DATA
packet is significantly longer than a single contention slot this
potential gain is completely offset by the increased collision cross
section of the unsolicited DATA packet compared to an RTS; a DATA
packet containing a 40-byte TCP+IP header is too long to be sent this
way. We have solved this problem by implementing a version of Van
Jacobson's TCP header compression algorithm[Jacobson90] for MACA. This
algorithm typically compresses TCP+IP headers to 3 or 4 bytes,
resulting in a total of about 20 bytes of data in the MACA packet
associated with a pen event. The measurements below show that these
packets are small enough to benefit from being sent as unsolicited
data.

As originally described, Jacobson's header compression requires the
link layer to report receive errors to the decompressor, which then
enters "toss mode" until receiving an undamaged packet containing an
explicit connection number. Because collisions, which happen regularly,
look to the hardware like receive errors, the MACA layer is unable to
provide this information reliably. Fortunately, we can rely on the
ackowledgements provided by MACA to ensure that the compressor is
notified when a compressed packet may not have been delivered. It is
straightforward to modifiy the algorithm so that connection state is
discarded at the compressor end (rather than the decompressor end) in
this case.

Coalescing TCP writes into larger packets

Even with optimizations such as TCP header compression and MACA 2-way
exchange, the packet overhead for a small packet is quite large. This
is a particular problem for traffic such as pen events, since they tend
to be short (16 bytes) and spaced out in time (typically 50
events/second). To get adequate user response, it is important to keep
the delay on delivering pen events as small as possible. However,
delivering each in its own TCP packet uses up an incredible amount of
channel bandwidth.

One technique for avoiding small data packets was suggested by Nagle
[Nagle84]. This scheme avoids sending less than a full-sized TCP
segment whenever there's already some unacknowledged data outstand-
ing. Unfortunately, it does not work very well for a low bandwidth
continuous data stream such as pen events, as it adds roughly a full
round trip worth of delay to all the events and this seriously degrades
responsiveness.

Instead of using Nagle's technique, our system takes advantage of the
callback mechanism provided by the MACA driver. It leaves all the data
queued at the TCP layer for as long as possible, creating a new packet
and handing it to MACA only when the last packet has been sent on its
way. Holding onto the data at the TCP layer does not introduce any
extra delay, as packets are made available as fast as MACA can deliver
them. In fact, since the MACA queue is kept small, it reduces the
average delay. In addition, the larger packets which result improve
overall throughput.

Network performance

Basic TCP throughput

Using 4-way exchanges and no header compression, and sending 512-byte
data packets, we achieve a TCP throughput of 117.6 kbps. This
represents a little less than half the bandwidth of our 256 kbps radio
link.

To put this number in perspective, we note that our MPad MACA
implementation requires approximately 4 ms for an RTS-CTS exchange, of
which about 1.5 ms is accounted for by packet transmission times. The
time required for a 0-byte DATA-ACK exchange plus a random backoff
computation (in a lightly-occupied cell with the backoff parameter at
its minimum value) is approximately 6 ms, again including 1.5 ms packet
transmission times. Thus, the 4-way exchange adds 10 ms (320 byte times
at 256 kbps) to the cost of sending a data packet. Using 512-byte
packets we get an acknowledgement for every second TCP packet. Thus,
sending 1024 bytes of data requires three 4-way exchanges (two data
packets and an ACK) and sends three 40-byte TCP-IP headers. The total
overhead is 960 byte times for the three 4-way exchanges, plus 120
bytes for the three TCP-IP headers, or a total of 1080 byte times of
overhead to send 1024 bytes of data.

The effect of the fixed 10 ms overhead is proportionally greater as the
packet size goes down: again using 4-way exchanges and no header
compression, but sending 16-byte data packets (to simulate pen events),
we achieve a TCP throughput of only 10.2 kbps.

Header compression

Adding TCP header compression and 2-way exchanges of unsolicited DATA
packets as described above yields worthwhile performance improvement
for short packets.

Using 512-byte data packets, so that TCP ack packets can be sent by
2-way exchange but TCP data packets require a full 4-way exchange, our
measured TCP throughput is 121.6 kbps, an improvement of only 3.4% over
the 117.6 kbps figure obtained without compression.

However, using 16-byte data packets, so that TCP data and ack packets
can both be sent by 2-way exchange, our measured TCP throughput is 14.0
kbps, an improvement of 37% over the 10.2 kbps figure obtained without
compression.

Fair bandwidth division

To show that MACA provides fair distribution of bandwidth among
multiple stations, we ran repeated experiments in which 4 MPads sent
data as fast as possible to the same base station using 512-byte TCP
data packets with compression and 2-way exchange enabled. Data rates
achieved by individual MPads ranged from a low of 28 kbps to a high of
37 kbps. Interestingly, there was no performance penalty as a result of
sharing the channel - in all experiments the aggregate throughput was
around 125 kbps, actually slightly higher than the maximum throughput
achievable by a single MPad sending alone.

TCP write coalescing

To test the performance of the TCP write coalescence algorithm, we
created an artifical source of "pen traffic." It generated 16 byte
chunks of data on a TCP stream at a rate of 50 events/second. The
resulting stream needed 800 bytes/second, or 6.4 kbps of bandwidth.

Based on the above numbers for 16 byte data packets, a stream such as
this would use up roughly half of the channel if no data was coalesced.
However, with coalescence, four pads in a single cell were able to
simultaneously get exactly the 6.4 kbps they asked for.

Standard X traffic would involve not only pen events on the channel,
but also some set of traffic in the reverse direction with drawing
requests. While this reverse traffic is typically blocked into fairly
large packets, the bandwidth available for it can be greatly reduced
by the pen traffic. The available bandwidth for such a reverse stream
without coalescence was only 53.1 kbps, or 43.7% of the bandwidth
available on an otherwise idle channel. With coalescence, the
available bandwidth rose to 85.6 kbps.

X servers for non-wired devices

For the purposes of discussing our X server work, there are two classes
of devices - wired and non-wired.  The distinction is really driven by
the available bandwidth and latency of the connection of the device and
the rest of the computing infrastructure.

The liveboard and scratchpad are driven by fairly straightforward ports
of the MIT sample server. To the software, the liveboard looks like a
monochrome frame buffer with a slightly odd input device - we have made
small changes to simultaneously support the standard mouse as well as
the liveboard pen. There is one liveboard that has provisions for
multiple pens, and we have used the X Input Extension to support these,
but have little experience with applications that can actually make use
of them. Similarly, the scratchpad is just another monochrome frame
buffer with a one button input device (the pen).

The various incarnations of Pads and the Tabs have been more
challenging. The pad radios have all had a basic data rate of 256 kbps,
of which we expected to get 120 kbps by the time we add in the overhead
of media access, IP and TCP. Practical experience with X over leased
56kbps lines led us to believe that this is adequate, and our
experience has shown this to be so, with the possible exception of
response time for stylus input events.

Porting decisions

We wanted to get the hardware deployed quickly in order to allow
developers to begin experimenting with new code. Thus, we chose a
porting strategy that got applications running against the Pads
quickly, but was not intended to give optimal performance.

The first such attempt was a simple bitmap difference program. This
program runs on a workstation, opens the frame buffer memory, and
continually reads the upper left hand section that corresponds to the
size of the pad display. Each successive "frame" is bitwise compared to
the previous one, and the differences are compressed and sent to the
pad across the radio. Input events from a mouse attached to the pad are
fed back into the server by the same program warping the workstation's
cursor and using SendEvent to synthesize button presses.

There were two big problems with this approach, both having to do with
the latency involved. The most obvious was that cursor tracking in this
scheme was almost impossible. The latency of the path from moving the
mouse to getting the feedback of the cursor having moved was almost
half a second, which was most disturbing and essentially useless. The
second was that the bitmap differences could get as far as 10 seconds
behind, exacerbated by media contention with traffic from subsequent
cursor movement and updates.  Another problem was that the constant
polling of the frame buffer memory was a load on the system. So the
first lesson that we learned is to implement cursor tracking as close
as possible to the input device, something that many hardware vendors
still haven't learned, since they are using the "software cursor" code
from the sample server.

Our next choice was to do a partial port of the X server. We didn't
wish to port completely to the pad, for two reasons: the MIT sample
server is strongly dependent on a fairly heavyweight operating system
environment (in particular, it counts on having a file system to store
much of its information), and we did not wish to route inter-client
communication that uses the X server as a rendezvous point over the
slow radio.

The sample server is made up of several "layers" in an attempt to
provide the server implementer or porter great flexibility in trading
off implementation effort for ultimate performance [Angebranndt88,
Angebranndt90]. The lowest, device-dependent layer ("ddx") can be
implemented with highly tuned code that has extremely intimate
knowledge of the device. Or it can be implemented with the supplied
very simple code ("mi") that counts on little about the device except
how to paint or fetch a horizontal line of pixels from the display.
This set of minimal routines is called the Pixblt layer.

We devised a "split server" architecture, where the Pixblt routines run
on the pad, and the rest runs on a workstation, with a network
connection between them. This Pixblt layer, consisting only of routines
to handle the input device and draw and fetch one-bit high bitmaps,
is very straightforward to implement, and the rest of the server allows
a quick and dirty port by implementing just these simple routines.

We built this first with two simulations - a network throughput
simulator and a pad simulator. The throughput simulator allowed us to
control the available network bandwidth to simulate behavior over the
radio, and the pad simulator allowed the back-end pad code to be
debugged on a workstation without worrying about infant hardware
environment problems. When this was all working, we moved the code to
the actual hardware in short order.

Unfortunately, the throughput simulator simulated a slow, but perfect,
TCP link. As detailed in the previous section, we discovered that
building a reliable radio device and media access protocol is not a
straightforward task. Not only is reliability of transmission a
problem, the effects of small packet sizes on available channel
capacity was unforeseen.

This allowed us to get some of our applications up and running, albeit
slowly. The first thing that we noticed was how slow everything seemed.
We had to re-learn the lesson about cursor tracking from the Xpad, and
quickly implemented tracking of the cursor and changing its shape on
the pad side. We then began to analyze what protocol was being sent and
took steps to improve perceived performance, both by tuning our code
and adding to the protocol.

Rectangles are the most important

There are three routines in the Pixblt layer: FillSpans, which fills a
set of horizontal, one-bit high bitmaps (spans) according to the
current fill rule (a solid color or a pattern tile); SetSpans, which
fills a set of spans from a source bitmap, and GetSpans, which reads a
set of spans out of the frame buffer. In order to make the code as
simple as possible, FillSpans was implemented without requiring any
state on the pad side. This meant that the pattern tile, if any, had to
be transmitted with each request. We quickly learned what a horrible
decision this was - most FillSpans requests that were user feedback (as
opposed to those involved in displaying a window for the first time)
involved very small spans, and time spent sending the tile dominated
the time required to satisfy the painting request. By adding a small
cache of pattern tiles on the pad side and a few protocol requests to
manage it, we were able to paint simple 1010 rectangles over ten times
as fast.

We learned this lesson again as we added a protocol requests to fill a
rectangle, rather than a series of spans, from a tile. This allowed a
rectangle to be described by four 16-bit integers, instead of two
16-bit integers and a 32-bit width per pixel of height. This change
painted 1010 rectangles 60 times faster. Moreover, it provided a
subjective shift in the "snappiness" of the system - when a window is
first mapped, it is painted as a main rectangle with many smaller
rectangles for borders and window manager-supplied decoration. These
rectangles all were now painted much more quickly, giving the
perception of much greater speed.

We learned a similar lesson when we switched the code that implemented
SetSpans from sending one request per span to batching a TCP window's
worth of spans before sending them. This only doubled the measured
copying speed, but the visual effect was much more striking: there was
a longer pause while the data was received, but then painting proceeds
as fast as the CPU can execute the inner loop.

Each of these reductions in protocol overhead had a side benefit - the
cursor tracking code removes the cursor from the screen just before
processing a request and replaces it immediately after. Fewer protocol
requests mean less time wasted repainting the cursor and even snappier
response.

We currently have complaints about two areas of performance: painting
text, and copying between Pixmaps (bitmaps stored in the server) and
the screen. Making fonts go faster requires adding font support to the
pad, which means either adding network file system access to the pad's
operating system, or porting the X font- server client code to the pad.
A "quick hack" might be to download fonts on the fly, but this looks
like as much work as doing it right.

Pixmaps are stored on the workstation side of the split server, mostly
for expediency of implementation.  (Moving the pixmaps to the pad means
that the server has to do management of the remote memory, and painting
into pixmaps suddenly becomes slow because it has to go across the
wire, or requires a lot of the drawing code to be moved to the pad, so
that the painting requests can execute there.) As a result, displaying
a pixmap on the screen results in a lot of SetSpans requests going over
the wire, filled with bits to be painted.

We have implemented G3 compression algorithms for the SetSpans code, to
try to minimize the amount of data sent. This code is currently turned
off. Most of the SetSpans requests send small enough amounts of data
that the effort required to decompress the bits is more than the time
saved by reducing the bandwidth.  The protocol overhead of sending the
request is the dominant factor.

Next steps

At this point, we determined that doing window copies local to the pad,
and providing direct font support (text is currently painted using mi
code that breaks the glyphs into spans and sends them) are next in
importance. The code and protocol changes to implement these are
fairly substantial, especially since the pad has no access to a file
system (to get the fonts). As the design of this next phase proceeded,
we came to the realization that we were implementing more and more of
the sample server on the pad, in fairly undisciplined manner.

Thus, we have decided to do a full port of the server to the pad, in
order to measure the actual cost of the inter-client traffic that we
worried about. We expect to be aided in this by the X Consortium's Low
Bandwidth X (LBX) effort, which is attempting to address the
bandwidth requirements of the core X protocol [Fulton93]. In
particular, LBX intends to decrease the cost of the empty space in the
X protocol encoding, and use caching techniques to reduce
inter-client communication where possible.

Performance of text-based applications is quite satisfactory at this
point. But our target applications, which mostly involve inking of
stylus input, are still rather slow. This is because the ink is being
drawn by the client program, and it takes a fairly long time for the
input event to reach the client program and the feedback request to
reach the display. One solution to this problem is to provide an
extension that does local inking [Rhyn91, Kempf93]. We have so far
resisted this option, because it means a non-standard change to the
server that we have to maintain and track across releases, both in the
Mpad server and our workstation servers (so applications developers can
work at their desktops, one of the original motivations for using X).
We prefer to exhaust tuning opportunities at all other points in the
system.

Another solution is to build local clients - applications that run
"native" on the pad and do not incur the overhead of radio
communications before providing feedback. This is a large undertaking.
Our operating system does not currently support any form of dynamic
loading. Local clients must share screen real estate with remote
clients: they must either be X clients, or we must provide a
lower-level real estate manager that both local clients and the X
server manipulate.

Sharing, migration and applications

Part of the ubicomp vision is that there are enough devices around that
it no longer become necessary to carry them; wherever you go, there
will be one that you can just pick up and start using, much as we treat
pads of paper today. This implies that it should be easy to "whistle
up" your state from display device to display device.

Doing this with X is problematic at best. When a client program first
contacts a display server, it must learn a fair bit about the display
hardware and tailor itself to the display. In particular, an
application will determine the density of pixels on the screen, how
many significant bits of color information are available, allocate
some pixels out of the display's colormap, and set up some pattern
tiles. These resources are identified by 32-bit identifiers unique to a
connection, and are used in future protocol requests. If the client
program wishes to start drawing on a different X display, it must be
prepared to recreate all this state and possibly alter its operation
(if, for example, there are more or fewer bits of color display
available). If the client wishes to draw on two or more displays at
once, it must keep track of multiple sets of resource IDs and color
models. Few existing clients are prepared to do this - if the
connection goes away, the client exits.

There are basically two options available to the user who wishes to
migrate or share a client between several displays. The first is using
a program known as a pseudoserver, which program sits between the
application and the X server driving the display(s) [Gust88,
Patterson90, Menges93]. It looks like a display server to the
application, and a client to the display server. The pseudoserver may
perform arbitrary transformations on the protocol requests, including
remapping resource IDs and multicasting protocol requests. The second
is to provide an underlying toolkit for the application which is
prepared to manage multiple displays.

The primary advantage of using pseudoservers is that it works with any
unmodified X application. Since the pseudoserver is dealing directly
with raw X protocol it need not know anything about what the
application does or how it works. There are several disadvantages
however. Because applications generally connect to a server for their
lifetime, if the user ever wishes to share or migrate she must connect
through the pseudoserver when first starting the application. If she
decides she wants to share an application that was not started with a
pseudoserver, there is no way to do it without restarting. If all
applications are run with a pseudoserver then the cost of the
pseudoserver is paid, even if nothing is ever shared. There is also a
performance cost to using a pseudoserver because it introduces an extra
communication hop in the X protocol, so roundtrip requests to the
server require four context switches instead of two.

X requires clients to have fairly detailed understanding of the display
hardware's color model. Most clients do not make provisions for the
color model to change during the connection lifetime. Thus a
pseudoserver must choose carefully what color model it will provide to
clients, and must be able to implement some reasonable facsimile of
that color model on every display it encounters. A simple pseudoserver
will force clients to run in monochrome mode, but this is not very
satisfactory. An ambitious pseudoserver will offer 24-bit deep true
color, and attempt to render it on whatever hardware it finds. This is
very slow. Most pseudoservers offer 8-bit color and do the best they
can. This is close to the approach taken by NeWS [Gosling], except that
NeWS implements the mapping of the single abstraction to the hardware
very close to the hardware. The pseudoserver must translate the X
requests from the client in a fairly clumsy manner.

Shared window servers handle about 90% of existing applications without
too much trouble. The applications that cause trouble are those that
understand a lot about the display that they are running on to do more
"interesting" things. An example of an "interesting" application is one
that allocates planes in the colormap in order to do animation by
manipulating colors; a pseudoserver can not guarantee that it will
implement this correctly on displays with different color capabilities
or even on different displays with the same color capabilities but
different colormap contents. A different class of problem is raised
when sharing or migrating among displays with widely differing
resolutions. An application that starts out with a display tailored for
a 72 dpi display will probably not adapt correctly to a 300 dpi
display, and a pseudoserver can only do so much in this case, because
distances in the X protocol are expressed in terms of pixels, not a
linear measure such as points, inches or millimeters.

The second option is to write the client with a toolkit that hides the
details of sharing and migration, and takes care of connection startup
and shutdown, resource allocation and remapping, and so on. This
toolkit will probably not expose as much detail about the hardware as
the X protocol allows, and may save as much state about the client
program as necessary in order to map the drawing model it provides onto
the various display hardware it comes across. Dealing with these
problems at the toolkit layer allows applications to look "right" on
several different displays at the same time, since the application
defines itself in terms of the model the toolkit provides, instead of
the arbitrarily complex model available by using the full X protocol.
All applications written with the toolkit gain the ability to share and
migrate, and applications can be written which are aware of this and
have special behavior. The applications behave in a consistent way and
do not require extra effort to begin sharing. Performance is better
since there is no pseudoserver involved.

We believe that most of the existing applications are not appropriate
for a shared environment, or even for a pen-based environment. So a 90%
solution for the simpler applications that we wish to retain is
adequate.  For the new applications that we will build, we are building
the ability to migrate into the underlying toolkit.  We have built two
such toolkits: one for the Cedar environment [Jacobi92], and one which
invisibly adds sharing and migration to the Modula 3 Trestle toolkit
[Manasse89]. Both of these toolkits provide a (different) abstract
drawing model that allows clients to express an ideal display, which is
then rendered as well as possible on the available hardware.

There is a third approach to sharing state with X clients, and that is
to introduce an application-specific protocol for exchanging data
between clients that contact a single display. The Tivoli shared
drawing program, developed at PARC [Pedersen93], and Van Jacobsen's wb
program, developed at the Lawrence Berkeley Laboratories [Jacobson92]
use this approach. Tivoli uses direct TCP connections between all the
clients sharing a shared drawing surface (including clients that do not
use X to display the surface), and wb uses multicast IP to communicate
the history of a drawing session.

Conclusions

We are not at all convinced that using X for pen-based mobile/wireless
computing is a good idea. The X protocol was designed to be operated
on an Ethernet; it was expected that available bandwidth would be high
and latency would be low and of low variance. Even on an Ethernet,
application programmers must try to minimize the number of round trips
from client to server. On a low speed link, this becomes even more of
an issue. The X Consortium LBX effort addresses some of these problems,
but it only directly deals with protocol encoding issues, not
necessarily problems raised by unusual network fabrics.

Treating the pads as X terminals forces us into a computing model which
makes it expensive to provide snappy input response (there is a long
delay between an input event and the client program's response to that
event). We can address this by running clients local to the pad, but
this means that the pad must support a fairly heavyweight computing
environment, complete with access to network and file resources. We can
address it by building an extension to the server that does local
inking or local input response, but this means that we must support a
non-standard body of code across changes to the X server. None of these
solutions provides us with a way to give programs running entirely on
the pad direct access to the pad's display; another modification to the
X server is needed.

We will continue down this path for a while longer, because the plus
side is that the development environment is familiar to us, and
allows us to deploy our applications quickly across all our platforms.
Developing a new window system and deploying it is an expensive
proposition, and one which we are not willing to enter into lightly.

Acknowledgements

Lawrence Butcher developed the networking basestations. Ed Richley
designed and built our radios, and Melvin Chan keeps them running.
Steve Deering made mobile IP work. Tom Rodriguez (a summer intern from
Georgia Tech) resurrected JoinVBT in the distributed Trestle toolkit
and made it work in X across hardware with different color models.

References

[Adams93a]      Norman Adams, R. Gold, B.N. Schilit, M. Tso and R.
Want. `An infrared network for mobile computers'. Proceedings USENIX
Symposium on Mobile and Location-independent Computing, Cambridge
Massachussetts.August 1993.

[Adams93b]      Norman Adams, R. Gold, B.N. Schilit, M. Tso and R.
Want. The PARCTAB mobile computing system. Xerox Palo Alto Research
Center. In preparation.

[Angebranndt88] Susan Angebranndt, R. Drewry, T. Newman and P. Karlton.
`Strategies for porting the X11 sample server'. Software Distribution
Center, Massachusetts Institute of Technology, Cambridge, MA. 1988

[Angebranndt90] Susan Angebranndt, P. Karlton, R. Drewry and T. Newman.
`Writing tailorable software', Software-Practice and Experience,
20:S2/109-S2/118, 1990.

[Elrod92]       Scott Elrod,R. Bruce, R. Gold et al. Liveboard: a large
interactive display supporting group meetings, presentations and remote
collaboration. Xerox Palo Alto Research Center. Report CSL-92-6, 1992.

[Fulton93]      Jim Fulton and C. Kantarjiev. `An update on low
bandwidth X (LBX); a standard for X and serial lines'. Proceedings of
the 7th annual X Technical Conference, January 1993. O'Reilly &
Associates, 1993.

[Gosling89]     James Gosling, D. Rosenthal and M. Arden. The NeWS
Book: an introduction to the networked extensible window system.
Springer-Verlag, 1989.

[Gust88]        Phil Gust. `Shared X: X in a Distributed Group Work
Environment'. Unpublished paper presented at the second annual X
Technical Conference, January 1988.

[Jacobi92]      Christian Jacobi. `Migrating widgets'. Proceedings of
the 6th annual X Technical Conference, January 1992. O'Reilly &
Associates, 1992.

[Jacobson90]    Van Jacobson. `Compressing TCP/IP headers for low-speed
serial links'. Arpanet Working Group Requests for Comment, DDN Network
Information Center, RFC1144. February 1990.

[Jacobson92]    Van Jacobson. A portable, public domain, network
whiteboard. Presentation at the Xerox Palo Alto Research Center, April
28, 1992.

[Karn90]        Phil Karn. `MACA - a new channel access method for
packet radio'. Proceedings of the ARRL 9th Computer Networking
Conference, London Ontario, Canada.  September 22, 1990.

[Kempf93]       James Kempf and A. Wilson. `Supporting mobile,
pen-based computing with X; mobile information for hi-tech nomads'.
Proceedings of the 6th annual X Technical Conference, January, 1992.
O'Reilly & Associates, 1992.

[Manasse91]     Mark S. Manasse and G. Nelson. Trestle reference
manual. Digital Systems Research Center. Research Report 68, 1991.

[Menges93]      John Menges. `The X engine library: a C++ library for
constructing X pseudoservers'. Proceedings of the 7th annual X
Technical Conference, January 1993.  O'Reilly & Associates, 1993.

[Nagle84]        John Nagle, `Congestion control in IP/TCP
internetworks'. Arpanet Working Group Requests for Comment, DDN Network
Information Center, RFC-896.  January 1984.

[Patterson90]   John F. Patterson. `The good, the bad and the ugly of
window sharing in X'.  Unpublished paper presented at the fourth annual
X Technical Conference, January 1990.

[Pedersen93]    Elin Ronby Pedersen, K. McCall, T. Moran and F. Halasz.
`Tivoli: An electronic whiteboard for informal workgroup meetings'.
Human Factors in Computing Systems: Proceedings of InterCHI 1993.
Association for Computing Machinery, 1993.

[Rhyn91]        Jim Rhyne, D. Chow, M. Sacks. `The portable electronic
notebook'. Unpublished paper presented at the fifth annual X Technical
Conference, Cambridge, MA.  January 16, 1991.

[Richley93]     Ed Richley and S. Elrod. Experiments with near-field
radios. Xerox Palo Alto Research Center. In preparation.

[Scheifler92]   Robert W. Scheifler and J. Gettys. X Window System.
Digital Press, Bedford, MA, 1992.

[Sidhu90]       Gursharan S. Sidhu, R. Andres, and A. Oppenheimer.
Inside AppleTalk.  Addison-Wesley, 1992.

[Weiser91]              Mark Weiser. `The computer for the 21st
century'. Scientific American 265(3):93-104, September 1991.

Author Information

Christopher Kent Kantarjiev is a Member of the Research Staff in the
Computer Science Lab at the Xerox Palo Alto Research Center (in Palo
Alto, CA, natch). He has a B.S. in Physics from Xavier University
(1979) and a Ph.D. in Computer Sciences from Purdue University (1986).
He is a frequent contributor to the magazine of the Vintage Triumph
Register. He can be reached as cak@parc.Xerox.COM.

Alan Demers is a Principal Scientist in the Computer Science Lab at the
Xerox Palo Alto Research Center.  He has a B.S. in Physics from Boston
College (1970) and a Ph.D. in Computer Science from Princeton Uni-
versity (1975). He can be reached as demers@parc.Xerox.COM.

Ron Frederick is a Member of the Research Staff in the Computer Science
Lab at the Xerox Palo Alto Research Center. He has a B.S. in Computer
and Systems Engineering from Rensselaer Polytechnic Institute (1989)
and an M.S. in Computer Science from Stanford University (1991). He can
be reached as frederick@parc.Xerox.COM.

Robert T. Krivacic is a Member of the Research Staff in the Computer
Science Lab at the Xerox Palo Alto Research Center. He has a B.S. in
Computer Science from Bowling Green State University (1974) and a M.S.
in Computer Science from The University of Colorado (1978). He can be
reached as krivacic@parc.Xerox.COM.

Mark Weiser is the head of the Computer Science Lab at the Xerox Palo
Alto Research Center. He has a PhD in Computer and Communication
Sciences from the University of Michigan (1979). His hobbies are
existential philosophy, sociology of science, and writing computer
games; his recent technical work has been on garbage collection and
ubiquitous computing. He can be reached as weiser@parc.Xerox.COM.