An Analysis of Process and Memory Models to Support High-Speed
Networking in a UNIX Environment
BJ Murphy
Computer Laboratory, University of Cambridge, UK
S Zeadally, CJ Adams
Department of Computer Science, University of Buckingham, UK
Abstract
In order to reap the benefits of high-speed networks, the performance
of the host operating system must at least match that of the
underlying network. A barrier to achieving high throughput is the
cost of copying data within current host architectures. We present a
performance comparison of three styles of network device driver
designed for a conventional monolithic UNIX kernel. Each driver
performs a different number of copies. The zero-copy driver works by
allowing the memory on the network adapter to be mapped directly into
user address space. This maximises performance at the cost of: 1)
breaking the semantics of existing network APIs such as BSD sockets
and SVR4 TLI; 2) pushing responsibility for network buffer management
up from the kernel into the application layer. The single-copy driver
works by copying data directly between user space and adapter memory
obviating the need for an intermediate copy into kernel buffers in
main memory. This approach can be made transparent to existing
application code but, like the zero-copy case, relies on an adapter
with a generous quantity of on-board memory for buffering network
data. The two-copy driver is a conventional STREAMS driver. The
two-copy approach sacrifices performance for generality. We observe
that the STREAMS overhead for small packets is significant. We report
on the benefit of the hardware cache in ameliorating the effect of the
second copy, although we note that streaming network data through the
cache reduces the level of cache residency seen by the rest of the
system.
A barrier to achieving low jitter is the non-deterministic nature of
many operating system schedulers. We describe the implementation and
report on the performance of a kernel streaming driver that allows
data to be copied between a network adapter and another I/O device
without involving the process scheduler. This provides performance
benefits in terms of increased throughput, increased CPU availability
and reduced jitter.
Download the full text of this paper in
ASCII (47,887 bytes) and
POSTSCRIPT (663,600 bytes) form.
To Become a USENIX Member, please see our
Membership Information.