################################################
	   #                                              #
	   # ##   ## ###### ####### ##    ## ## ##     ## #
	   # ##   ## ##  ## ##      ###   ## ##  ##   ##  #
	   # ##   ## ##     ##      ####  ## ##   ## ##   #
	   # ##   ## ###### ######  ## ## ## ##    ###    #
	   # ##   ##     ## ##      ##  #### ##   ## ##   #
	   # ##   ## ##  ## ##      ##   ### ##  ##   ##  #
	   # ####### ###### ####### ##    ## ## ##     ## #
	   #                                              #
	   ################################################


	 The following paper was originally published in the
      Proceedings of the USENIX Summer 1993 Technical Conference
		  Cincinnati, Ohio  June 21-25, 1993


	For more information about USENIX Association contact:

		   1. Phone:	510 528-8649
		   2. FAX:	510 548-5738
		   3. Email:	office@usenix.org
		   4. WWW URL:  https://www.usenix.org


Design And Implementation of a Multimedia Protocol Suite in a BSD
Unix  Kernel

This  research  is  supported  in  part  by the
National Science Foundation Grant No. NCR-9111323 and  Grant  no.
STI-9108764.

Lakshman K, Giri Kuthethoor, Raj Yavatkar Department of  Computer
Science University of Kentucky, Lexington, KY 40506


Abstract

Development  of  distributed  multimedia   applications
requires     support   for   coordination   and   temporal/causal
synchronization of traffic  over  related  streams.  Our  current
research   involves   investigation   of   appropriate   OS   and
communication abstractions to support such applications.  Towards
this  goal,  we  have  designed  and  implemented MCP, a suite of
transport and session layer protocols,  in  the  framework  of  a
standard  BSD  Unix  networking  platform.   MCP contains two new
abstractions. First, MCP contains  a  token-based  mechanism  for
coordination  of  traffic  over a multipoint connection.  Second,
MCP includes an abstraction  called  a   multi-flow  conversation
that  enforces  both  temporal  and  causal synchronization among
related  data  streams.   This  paper   discusses   Unix   kernel
implementation  of MCP and describes our experience in using MCP.
abstract

Introduction


With the increasing  use  of  high  speed  networks,  it  is  now
possible   to   build   multimedia   applications   that  involve
geographically  dispersed  group  of  users.  These  applications
involve  concurrent  communication  of text, voice, graphics, and
video. Our current research involves investigation of appropriate
OS   and   communication   abstractions  to  support  distributed
multimedia applications. Towards this goal, we  have  designed  a
suite  of  transport  and  session  layer protocols [mcp-dcs] and
implemented the protocols in the framework of a standard BSD Unix
networking platform.

This paper describes our experience in design and  implementation
of   transport  and  session  layer  protocols   for  distributed
multimedia applications.

Development  of  distributed  multimedia  applications   requires
resolution  of  several  issues  including media coordination and
synchronization, media mixing, OS scheduling, and the problem  of
network resource allocation  to provide performance guarantees in
terms of delay bounds  and  bandwidth  reservation.  However,  we
must  emphasize  that  this  paper only addresses the question of
providing   suitable   transport   and   upper   level   protocol
abstractions  to  meet  the  temporal/causal  synchronization and
coordination needs of such applications. In particular, the paper
addresses   the   following   aspects   of  building  distributed
multimedia systems:


 Unique   communication   requirements   posed   by   distributed
multimedia  applications  and  need for appropriate communication
support to facilitate development of such applications.


 Design of suitable transport  and  session  layer  communication
mechanisms to build necessary communication framework.


 Implementation of a transport and session layer  protocol  suite
using  the  SUNOS  4.1.1  version  of  Unix  as  the experimental
platform.

 Investigation of suitability and extensibility of the  UNIX  IPC
facilities,   protocol software structure, and user interface  in
accommodating needs of multimedia applications.


 User interface requirements to support multipoint, collaborative
applications.

Rest of this paper is organized as follows. Sections~motive]  and
back   provide  the  motivation  and  background  for  our  work.
Section~approach describes our approach including design  of  MCP
and discusses the design alternatives and tradeoffs. Section~impl
describes our implementation of MCP suite in SunOs 4.1.1  kernel.
We  have  conducted  a  systematic  performance evaluation of MCP
kernel implementation and have also compared the  performance  of
user  applications  with and without MCP. Sections~experience and
perf  discuss  the  results  of   our   performance   evaluation,
Section~related  compared  our  work  with  work  by  others, and
Section~concl provides a summary of our work.

Motivation motive


Our goal is to facilitate development of  multimedia  distributed
applications   that  involve  a geographically dispersed group of
users. An example of such  an  application  is  a  collaborative,
software    engineering    environment[ellis-gibbs-89,colab-from-
xerox].  In such an environment, a group of designers located  at
different sites  collaborate  on a design document (or a program)
using   interactive tools to edit and test parts  of  the  design
under development.  Interaction may involve a group editor (based
on  shared windows),  an image display (that  displays  resulting
design),  and  a voice channel that allows them to view, discuss,
and edit the suggestions made by each other.

Our main  focus  here  is  on  the  issues  of  coordination  and
synchronization  of  traffic  over  related  data  streams. These
issues arise in following forms:

  A single multipoint communication involving multiple  users  on
multiple,  remote  sites  requires causal synchronization so that
all  participants ``see'' all the  communication  events  in  the
same  or  ``correct''  order.   For  instance, a voice conference
channel  that  spans  multiple  participants  requires   such   a
synchronization.

Apart from  voice/video  interactions,  both  communication   and
application-level  software  must  also  enforce  some  degree of
concurrency control when a group of users may simultaneously view
and update shared text or image data  to maintain consistency.
 The  communication  software  must  allow  an   application   to
coordinate  multiple information channels that carry traffic from
multiple media such  as  text,  voice,  video  (or  image).   The
collaborative,   software  development  group  effort   described
earlier is an  example  of  such  an  interaction.  For  such  an
application,   the  communication  software  must  allow  related
streams to be  grouped  together  and  recognize  the  order  and
sequencing  of  traffic  sent  over  them.   For  example, when a
participant scrolls through an image browser (or a shared  editor
window)  and  says,  ``look  at  the middle of the display'', the
statement should be heard at the same time (or  just  after)  the
scrolling  is  completed.  Such  temporal   relationships must be
captured  in  the  delivery  of  traffic  over  related  streams.
Another  example  of  such a synchronization is  ``lip-synching''
when voice and video are transmitted over separate connections in
a digital network.


Existing  transport   and/or   session   layer   protocols   lack
communication  abstractions  that provide the necessary semantics
for  coordination  and  temporal  synchronization  in  multimedia
applications.  We  are currently  investigating such abstractions
in our research and    are using them in building bf PolySchmues,
a  distributed  interactive  environment  for collaboration using
multiple media.

In  the  following,  we  first  elaborate  on  the   problem   of
coordination   and  temporal  synchronization  and  then  discuss
proposed solution.

Coordination and Temporal Synchronization Problem back


Hereafter, we will refer to a single or a  multipoint  connection
as a
 flow.   We  assume  that  the  network  layer  provides  a  flow
abstraction  that may have performance guarantees associated with
it to provide predictable performance needed by  real-time  voice
or video channels[ferrari-verma].  The question of how to satisfy
the real-time performance requirements of  a  flow  is   not  the
focus  of  discussion  here;  that question has been addressed by
many researchers[ferrari-verma,rsvp,mythesis]. Instead, the focus
here  is  on research issues in using and combining such flows to
build multimedia, collaborative, distributed  applications.   For
the  design  of multimedia systems such as PolySchmues, one  must
address two  aspects  of  coordination,  namely,   what  sort  of
coordination  is necessary and how much control must be exercised
at the  communication  substrate.  In  our  view,  two  kinds  of
coordination are necessary:

description [Intra-Flow Coordination] Given a
 flow spanning a group of users, one must consider the problem of
concurrency  control  when more than one user may be transmitting
data at the same time. A common example of such  coordination  is
avoiding  interruptions  and  crosstalk  in  a  multipoint  voice
channel.  Such coordination is also necessary when  communication
involves a shared text document or an image.

[Inter-Flow Coordination] A multimedia application  may  need  to
synchronize  traffic  over  multiple flows, each carrying traffic
from a different   medium  (text,  voice,  video,  or  image).  A
collaborative,   software  development  group  effort   described
earlier is an example of such an interaction.


description

Degree Of Coordination

The amount of coordination  needed  varies  in  both  inter-  and
intra-flow cases depending on the application.

In  the case of a   shared  window  package[colab-from-xerox],  a
single  window  is  displayed  on the display screens of multiple
users, and each user gets  an  identical  image  of  the  window.
Shared  window  systems   provide   no  concurrency  control  for
simultaneous actions  by  multiple  users.  Instead,  they  allow
users  to  constantly  see  the  actions  of  other users who are
responsible for manually ensuring that there are no conflicts.


At the  other  extreme  lie  teleconferencing  systems  in  which
interference  is avoided by strict control based on the notion of
a ``floor'', where only the  current  ``speaker''  that  has  the
floor is allowed to ``speak'' (or transmit data) at any time.


Strict coordination is also necessary among separate flows when a
group  of  users  are  pointing  at  a  shared  text  window  and
discussing parts of text over a  voice channel.

Both the extremes have their limitations.  On one hand,  lack  of
any   concurrency   control  puts  additional  responsibility  on
application designers to resolve conflicts. On  the  other  hand,
strict floor-based control does not allow applications to exploit
inherent concurrency and  may  sometimes  unnecessarily  increase
latency   due  to  the waiting involved.  In addition, there is a
continuum between these two  extremes  where  varying  degree  of
coordination  may  be  necessary  or   desirable for  present and
future applications.

For instance, users  in   a  conferencing  system  may  sometimes
decide to enter smaller discussion groups that may hold multiple,
concurrent conversations[colab-from-xerox].  A  ``brainstorming''
session based on the ``chalkboard'' metaphor[colab-from-xerox] in
a cooperative environment is another example where less stringent
coordination  is  appropriate.  In the case of a multimedia-based
group editor, strict coordination is not necessary  when  a  user
is  browsing through a part of text while another is annotating a
different part of the text.

Thus, the degree of coordination needed varies from time to  time
within an application and across different applications.

subsubsectionCausal Synchronization


Both intra-flow and inter-flow coordination also  require  causal
synchronization  when  more  than  two communicating entities are
involved.  For instance, if sender $S_2$ sends a message  over  a
connection  $C_1$  after ``hearing'' from sender $S_1$ over (say)
another connection $C_2$, all  other  participants  must  ``see''
messages from $S_1$ and $S_2$ in the proper causal order.

Enforcing causality requires that the   underlying  communication
system  buffer  out-of-order messages and  enforce the same total
ordering on  all related messages at  all  the  sites.   However,
traffic  delivery  over multimedia  connections (such as voice or
video)  usually  has  delay  constraints  and  the  messages  not
delivered  within their delay bounds are rendered useless.  Thus,
causally ordered delivery mechanism must also take  into  account
the real-time delivery requirements of multimedia traffic.


Our Approach approach

MCP  (Multi-Flow  Conversation  Protocol)  is  a  protocol  suite
consisting of transport and session layer protocols to facilitate
multimedia communication.


MCP provides two methods of  coordination:   token-based  control
and  an  abstraction called   multi-flow conversation]. Together,
these two methods yield a flexible  and   adaptable  coordination
mechanism.


Token-based Concurrency Control

When a  multipoint flow is created, a token is assigned  to  that
flow  that  acts  as  an  authorization  for data transmission. A
sender must hold a token to be able to send traffic over a  flow.
However,  token  management primitives are provided so that other
participants   can    obtain   transmission   privileges.   Thus,
applications  are  free to transfer, replicate, and delete tokens
to govern the degree of concurrency control needed.

This type of  concurrency control is entirely different from  the
token-based   synchronization   provided   by  OSI  session-layer
protocol. The latter method  of  synchronization  allows  session
participants  to insert resynchronization points (or checkpoints)
in the data stream to allow rollback  and to reduce the amount of
retransmitted data  in case of a transmission error.

We  use  the  token  mechanism  to  allow  several   levels    of
concurrency  control.  The following control hierarchy is derived
based  on  schemes   proposed   in   the   literature[colab-from-
xerox,ellis-gibbs-89,lantz-cscw86,swinehart-system-support].

description  [Floor  Control]  As  described  earlier,  real-time
teleconferencing   systems   employ  such  a  strict  concurrency
control. A single token enforces such control over  a  multipoint
connection.  The  token  will  be  passed  on from one speaker to
another whenever the floor is transferred. To obtain  control  of
the  floor, a participant must explicitly request transfer of the
token by invoking a token management primitive.

[Brainstorming] This form of coordination  is  common  in  shared
window  systems  where  there  is  no   concurrency  control  for
simultaneous actions  by multiple users. For  such  applications,
the    token   is    replicated   and   distributed  to  all  the
participants.

[Chalkboard Interaction] Applications based on the ``chalkboard''
metaphor in a cooperative environment  interact in two phases. In
the first phase, a speaker addresses a group of listeners with no
interruptions. The application may switch to the second phase any
time. In the second phase, a group  of  questioners  may  address
questions to the speaker.

The token-based method can accommodate both types of interaction.
Only  the  speaker  may  hold  the  token during the first phase,
whereas the token  may  be  replicated  and  distributed  to  the
questioners  during  the  second phase. Replicated tokens will be
destroyed at  the end of the second phase.

[Discussion  Groups]  Some   environments   such   as   real-time
conferencing  systems envisage a session breaking up into smaller
discussion  groups  and   thus   holding   multiple,   concurrent
conversations.  In  such  a system, initially only a single token
would be created and passed on from one  speaker  to  another  to
achieve strict ``floor-based'' control. However, the token can be
replicated and transferred to  each  discussion group,  and  each
group may then  use the token independently as it sees fit.


description


Multi-flow Conversations To allow temporal synchronization  among
traffic   over  multiple  flows,  we  introduce  a  communication
abstraction called a  multi-flow conversation.  A conversation is
a  logical  entity   and  consists  of one or more  (two-party or
multipoint) flows. An application  will  typically  first  create
individual      flows      with      appropriate      performance
requirements[ferrari-verma]. It will then create  a  conversation
that includes one or more related flows  to achieve the necessary
temporal synchronization.


A conversation (Figure conv) may consist of  one  or  more  (two-
party   or   multipoint)   connections,   MCP  enforces  temporal
synchronization  in  delivery   of   traffic   over   participant
connections.  For instance, consider a conversation consisting of
a multipoint  voice connection $V_1$ and a data (text) connection
$T_2$.   If  a  sender  S, sends a message on $T_2$ followed by a
message over $V_1$;  all  the  conversation  participants  should
receive  those  messages  in  the  same  sequence. In addition, a
conversation provides limited causal ordering among messages sent
by  multiple  sources.   For  instance,  if $S_2$ sends a message
$M_2$ over a constituent connection after ``hearing'' $M_1$  from
$S_1$,  all other participants should ``see'' messages from $S_1$
and $S_2$ in the proper causal order.

However, enforcement of such a causal order among  messages  sent
over a conversation has performance penalties (including delaying
message delivery for a  long time) and such penalties    are  not
acceptable to real-time flows  involving voice or video that have
better-never-than-late semantics. For instance,  real-time  voice
flows must deliver traffic within 100 milliseconds before quality
of  service  degrades  and  traffic  delivered  after  about  300
milliseconds is useless for voice interaction.

In fact, delivery of certain messages  might  be  meaningless  if
delayed  beyond  the  delay  constraint  of  the  flow  involved.
Therefore, we have decided to discard strict causal  ordering  in
favor  of  a notion of causality called bf $Delta$-causality that
takes  into  account  the  delay  constraints   associated   with
constituent flows.


Additional features of  MCP  include   an  out-of-band  signaling
channel  for  exchanging  control information and an asynchronous
interface  to  applications.  Out-of-band  signaling  facilitates
real-time  protocol  processing and simplifies processing of user
data. The asynchronous interface allows an application to specify
event-triggered actions to facilitate fast processing of periodic
data and changes to the status of multipoint connections.


Delta-Causality


Informally, $Delta$-causality works as follows. Given  a  set  of
flows  and  a  set  of  participants  in a  conversation, message
delivery to an  application  at  a  recipient  site  is  causally
ordered  provided  an  upper bound $Delta$ on end-to-end delay is
not violated for any of the related messages.  For  instance,  in
the  example  described  above,  the  receiver $S_3$ will see the
proper causal order provided $M_1$ and $M_2$ were  generated  and
sent  in  such  a  way that their arrival at $S_3$ is not delayed
beyond  an  upper  bound  $Delta$.  $Delta$  is  a  function   of
individual delay constraints for flows $Flow_1$ and $Flow_2$.

We  assume  that  each  flow  has   performance   characteristics
associated with it that are specified when a flow is established.
Apart from the packet rate and error rate parameters,  there  are
two delay constraints associated with each flow $F_i$:

description
 [Desired delay $delta_i$] is the maximum end-to-end  delay  that
the   flow   $F_i$   can   tolerate  before  quality  of  service
deteriorates.  Example of such a constraint is  100  milliseconds
delay bound in packet voice.

 [Loss delay $lambda_i$] is an upper limit  on  end-to-end  delay
for  flow traffic beyond which the delivered traffic delivery  is
useless. For example, packet voice or  video  have  better-never-
than-late  delivery  semantics  and  specify  such  a constraint.
Packet voice traffic with desired delay constraint of 100 ms  has
a loss delay constraint in the range of 200 to 300 ms.


description

Consider a  conversation C consisting of  flows  $F_1  ldots  F_i
ldots  F_n$  with  respective  delay  constraints $delta_i$'s and
$lambda_i$'s. We compute two conversation-wide desired delay  and
loss delay constraints:

quote
     $Delta_1 = max delta_i, i=1, ldots n $ hspace*10mm
     $Delta_2 = min lambda_i, i=1, ldots n $ quote


Based on these two constraints, we define the causality  interval
$Delta$ for each conversation. $Delta$  is computed as $Delta_1 <
Delta < Delta_2$ and defines  a  window  of  causality  for  each
message sent in a conversation.


It must be noted that our notion of real-time assumes large-grain
clock  synchronization  among  the  participants.  This is not an
unrealistic assumption as  fault-tolerant  clock  synchronization
algorithms  exist  that  achieve  such synchronization. In TCP/IP
Internet, Network  Time  Protocol  (NTP)  achieves  global  clock
synchronization    across    the    country    within    a    few
milliseconds[rfc1128].

The value of $Delta$ is chosen based on the  following  pragmatic
considerations. First, $Delta$ must at least be equal to $Delta_1
+  2 ast epsilon$ ($epsilon$ is the maximum possible clock skew).
Second, to allow some flexibility in the presence of fluctuations
in network conditions and delays, MCP provider may  choose  value
of  $Delta$  to be higher than $Delta_1 + 2 ast epsilon$  without
compromising individual flow semantics as long as $Delta$ remains
much below the upper bound $Delta_2$.


Design Issues

Given the need  for  flexible  coordination  and  synchronization
mechanisms,   we    carefully  considered  the  following  design
alternatives:

description

 [ Policy vs. Mechanism:] One of the questions we had to  address
was   whether   to   implement  the  necessary  policies  in  the
communication  substrate  or  to   build   only   the   necessary
communication  mechanisms   that  are flexible enough to allow an
application to choose and enforce desired policies.


We believe that the right way for  accommodating  a wide range of
coordination  and  synchronization  requirements  is   to provide
communication  mechanisms  that allow specification and selection
of  the  degree  of  synchronization needed and leave the task of
choosing the appropriate policy and degree of synchronization  to
each individual application.  Such a division of functionality is
appropriate because an application programmer  has  the  relevant
knowledge to specify the necessary conditions for causality.

 [ Kernel vs. Library:] Given the availability and popularity  of
existing  transport  protocols such as TCP or UDP, we also had to
grapple  with  the  issue  of  whether  we  could  implement  MCP
functionality   in   an  application  layer  library  on  top  of
conventional transport protocols. Advantage of such  an  approach
is  two fold. First, no modifications are necessary to the kernel
allowing  easy  portability  to  different   systems.  Second,  a
library-level  implementation  can always hide unpleasant details
of coordination and synchronization from an application and  thus
let   application   writer  concentrate  on  application-specific
details. Library level implementation is also  easier  to  modify
allowing experimentation with alternative designs.

However, library-level implementation approach  suffers from some
drawbacks.  A  distributed multimedia application must handle and
coordinate transmission and reception of  traffic  over  multiple
data  streams  and  must  have the ability to handle asynchronous
events that occur on any one of  the  streams.  If  you  consider
details  of rate-based flow control, buffer and timer management,
and asynchronous token management, even a multi-threaded  library
implementation   becomes   quite   unwieldy.    Some   degree  of
inefficiency also results  as the library must duplicate some  of
the  kernel  functions  such as event scheduling and buffer/timer
management.

Based on these arguments  and  our  previous  experience  with  a
library-based   implementation   of   an  experimental  transport
protocol[rgp-paper], we decided  in  favor  of  the  kernel-based
implementation.

description

 MCP Implementation impl

One of the goals of our research is to integrate MCP abstractions
into  a  general purpose operating system such as Unix so that we
can utilize  a  large  collection  of  toolkits  and  application
software   libraries[suite]   to   build  interactive  multimedia
applications. Towards that goal, we have implemented MCP in   the
framework of Unix 4.3 BSD networking software.

Implementation Issues

Unlike the implementation of TCP and  UDP  protocols  in  a  Unix
environment,  the MCP protocol makes   some additional demands on
networking software:


 MCP  must  support  both  point-to-point  (or   two-party)   and
multipoint (or  N-party) flows.  In the case of multipoint flows,
a flow is identified by a group address and the protocol software
must  be  careful  in  multiplexing and demultiplexing of packets
addressed to the group address.  For instance, if three transport
level  entities at a host may participate in the multipoint flow,
each endpoint must be separately identified  and  a  copy  of  an
incoming  packet   must  be  delivered to each local participant.
MCP must also  clearly  define  the  loopback  semantics  for  an
outgoing packet over such a  flow. For instance, the loopback may
mean that that all the local participants  other than the  sender
of the packet receive a copy of each   outgoing packet.

 MCP application-layer interface must make  provisions  to  allow
specification  of  QOS  parameters  such  as  data  rates,  error
tolerance,  and delay bounds for individual flows and to  specify
control  operations such as token request/transfer and joining or
leaving an  ongoing conversation.

Given the  QOS specifications, MCP must exercise rate-based  flow
control   according   to   the  QOS  specification  and   include
appropriate timer and buffer management  functions  to  implement
$Delta$-causality.


 MCP must also make provisions for reporting asynchronous  events
and  exceptional  conditions  to  user applications. For example,
arrival of a token or token transfer request can occur any  time.
Because  MCP  assumes  that  a  user  application  implements the
desired policy for  token  management,  MCP  must  asynchronously
report such events to the application.


 The size of  data units transmitted   varies  depending  on  the
media  used.  For  example,  a  voice sample may consist of a few
bytes whereas a video frame typically  consists  of  hundreds  of
bytes.  To  ensure  proper  temporal  synchronization,  MCP  must
preserve message boundaries  over  individual  flows  and  insert
synchronization  markers to ensure appropriate playout of related
media units at a receiver.

 Because the transmission and delivery of data over a  multipoint
flow  is  controlled  by  transfer of tokens and relevant control
information, processing of normal data is complicated if one uses
``in-band  signaling'' as in traditional transport protocols.  To
facilitate  real-time  protocol  processing   and   to   simplify
processing of user data, MCP must establish  an auxiliary control
connection for communicating out-of-band control messages related
to conversation and token management.


Unix Networking Software unix

Networking in the BSD  kernel  is  divided  into  three  software
layers (Figure softlayers): The socket layer, the protocol layer,
and the network  layer.  A   socket   acts  as   an  endpoint  of
communication  and  is  accessible  to  processes as a  Unix file
descriptor.  The socket layer is usually invoked by the C library
using  a  set  of  system  calls.  The protocol layer consists of
protocol suits or domains such as TCP/IP, Xerox NS,  and   Decnet
domains. This layer handles all protocol specific processing. The
network interface layer manages  the  physical  network  such  as
ethernet  or  token ring. Figure~softlayers shows the MCP's place
vis-a-vis other protocol modules and additional details  on  Unix
network software design can be found in [4.3-bsd-book].


In addition to the standard Unix software, two  extensions   that
support   network  layer  connections   (ST-II  and  COIP-K)  are
available in  public  domain[stii-partridge,coip-k].  Because  we
want to provide delay-constrained delivery of multimedia data, we
needed a network layer service that would  reserve  resources  to
guarantee  performance.   From that point of view, both ST-II and
COIP provide necessary service. We decided to use  COIP  protocol
because  COIP  protocol  implementation  consists  of   a toolkit
called COIP-K that  provides  a  nice  modular  structure  and  a
flexible platform to implement connection oriented protocols.


Socket Library and Programming Interface

MCP retains  the standard 4.3 BSD  socket interface  as  much  as
possible.   We  have  extended  the  socket  library  to  provide
additional  functionality   needed  by   MCP   applications.   In
particular,   we   have    extended    connect   call   to  allow
specification of multiple destination addresses for a  multipoint
connection.   We have added new options to the setsockopt call so
that an  application  can  specify  a  flow's  QOS  requirements.
setsockopt  call  is also used to request a token,  to explicitly
transfer a token to  another  participant,  or  to  join/leave  a
conversation.

We have also extended  getsockopt to  implement  an  asynchronous
event    notification    mechanism    as   described   later   in
Section~asynch-sect.


MCP retains the  conventional  TCP/IP  programming  interface  in
Unix.  Thus,  applications involving multipoint communication are
also written  using  client-server  paradigm.  In  a   multipoint
application,  a single participant acts as a client and all other
participants act as servers (or passive entities). An example  of
such  a   program  is shown in Figure~api-fig. Each server uses a
combination of  socket,
 bind,  listen, and  accept calls to indicate its willingness  to
join a multipoint flow.

After one or more flows are established, participants may add the
flows to a conversation.  Once a conversation is established, the
data transfer begins  using  the  usual  Unix   send/recv  calls.
Before  sending  data,  A  participant  must typically  request a
token using the  setsockopt call and a token holder may  transfer
the  token   using  another   setsockopt  call (see Figure~token-
xfer).


                 CLIENT                                    SERVER
s    =    socket(PF_COIP,    SOCK_MCP,   0)                 s   =
socket(PF_COIP,   SOCK_MCP,   0)   s   =    setsockopt(s,    MCP,
MCP_SETFLSPEC,                   .....
       &flowspec,  sizeof(flowspec)   )           bind(s,   addr,
addrlen);
                                               listen(s,5);
connect(s,  addr,  addrlen);                     snew = accept(s,
addr, addrlen);
             /* Establish a conversatioon */  setsockopt(s,  MCP,
MCP_INITCONP, sockarray, sizeof(sockarray));

caption typical multipoint MCP application consists of a   single
client  and  one  or  more servers. Servers are passive listeners
waiting  for  the  client  to  establish  a  connection.  Once  a
connection  is established, the token holder gets to transmit and
a server (or the client) must first obtain a token before sending
any data.


MCP Transport and Session Layer Implementation

MCP transport and session layers reside below the  socket  layer.
The   following   discussion   on  transport  and  session  layer
implementation assumes  that  readers  are  familiar  with  basic
structure  of  Unix  network  software including the two commonly
used data structures, Protocol Control Block (or PCB)  and   mbuf
(Memory  Buffer).  PCB  is  used  to hold protocol-specific state
information and  mbuf is another useful data structure  used   to
shepherd  an  incoming or outgoing packet through protocol layers
without repeated copying. More details can be found in  [4.3-bsd-
book,coip-k].

Internal  structure   of   MCP   implementation   is   shown   in
Figure~interact.  For  simplicity,  only  selected  functions are
shown.  The function  mcp_userreq handles all user requests  such
as  socket  creation, bind, connect, and read/write and passes on
these  requests  to  lower   layer   functions.   For   instance,
cin_output  at network layer transmits data to other endpoints of
a  flow.  Other  functions   implement   token   management   and
conversation abstraction as described below.


On input side, the routine  cinintr handles  packet  arrivals  at
the network interface and passes data to the transport layer. The
routine   mcp_ctxext  enforces  the  causality  requirement.  The
routine   mcp_ctlinput  handles  data transfer across the control
connection.


figure[tbp]

centerlinepsfigfigure=coiplay.ps captionInternal structure of MCP
implementation interact figure

subsubsectionTransport Layer Implementation

MCP transport layer is responsible for  establishment  of  point-
to-point  and  multipoint  data  and  control  flows  and for the
transfer of  data  to  and  from  the  MCP  session  layer.   MCP
transport layer identifies each endpoint of a flow using a unique
transport level address consisting of $<ip_addr, port_no>$.

When a client issues the  socket  call  followed  by  a   connect
call, MCP transport layer traces the following steps to establish
a multipoint flowfootnoteEstablishment  of  a  point-to-point  or
two-party   flow   is   similar   except   packet   delivery   is
straightforward as in TCP or UDP. Separate discussion  of  point-
to-point flows is omitted here.:

figure[tbp]                                      minipage[t]3.2in
psfigfigure=token.ps,height=2.8in,width=2.8in minipagehspace0.2in
minipage[t]2.9in     psfigfigure=mpcb.ps,height=2.8in,width=2.4in
minipage  captionExample  of  token   request  and transfer using
setsockopt call and unix signal is shown above. MCP PCB (Protocol
Control Block) is shown on right hand side.  token-xfer figure


 The transport layer creates a MCP PCB (see Figure~token-xfer) to
maintain  the  state information about the flow and fills in flow
specifications, list of flow participants, and initial status  of
token.


 MCP transport layer assigns the flow a unique  flow  identifier.
A  unique  flow  identifier  is  constructed by appending a local
sequence number to the IP address of the client's  site.

 The transport layer constructs an MCP open packet, fills in flow
identifier  and  flow  specifications,  and  invokes  the  COIP-K
network layer to establish  a multipoint flow  passing  the  open
packet as an argument.


Under the first approach, COIP-K  builds a tree

with client (active entity) as a root  that  connects  the  group
participants using point-to-point connections. Every  participant
forwards a message addressed to the group to the  entity  at  the
root  and the network-level entity at the root delivers a copy of
the message to the local participants and also forwards copies of
the  message  to  other  leaves  across a separate point-to-point
connection.


Under  the  second  approach,  MCP  uses  IP   multicast[deering-
multicast  to  send  and  receive packets over the flow. Thus, in
this approach, there is no  master-slave  relationship.  However,
additional  network  layer  functionality[rsvp]  is  necessary to
provide  performance  guarantees.   Given  a  destination   group
address,  the  IP  multicast  routing  module  (added to  COIP-k)
ensures that a packet reaches all the members of  the   multicast
group.  Current  implementation does not include error control to
ensure reliable multicast  delivery  and  is  the  focus  of  our
current research.

 Given a conversation, MCP also establishes  a separate   control
flow among the transport-layer peers at participating sites.  The
control  flow  is  devoted  to  exchange  of  state  and  control
information   related  to  the  conversation  and  typically  has
different  packet  delivery  semantics.   Communication  over   a
control  flow  is  typically only two-party and requires reliable
delivery.  When a conversation is created,  MCP  transport  layer
uses  the  COIP-K  layer to establish a control flow and the open
packet sent  to  participating  sites  carries  a  list  of  flow
identifiers for the constituent flows of the conversation.


MCP Session Layer Implementation

MCP session layer implements  two  important  functions,  namely,
token  management  across  a  multipoint flow and temporal/causal
synchronization among messages sent across constituent flows in a
conversation.

Data transmission  across a flow at each  sender  consists  of  a
sequence  of  messages.  Data  sent  using each  write call by an
application constitutes a single message. The MCP  session  layer
preserves  the  message  boundaries  in  delivery of data at each
participant   and   also   uses   the   message   boundaries   as
synchronization  markers (or points) for both temporal and causal
synchronization.

subsubsectionTemporal Synchronization

Given a flow, MCP session layer at each sender assigns  a  unique
sequence  number  to  each  message  sent  across  the flow. When
multiple flows are  part  of  the  same  conversation,   sequence
numbers  for  messages sent by the same sender across constituent
flows  are drawn from a single sequence number space.  Thus,  MCP
session layer can deduce the message ordering among messages sent
by the same sender across constituent flows of a conversation.

When a sender $S_1$ sends two messages $M_1$ and  $M_2$  in  that
order   across   two  constituent  flows  $F_1$  and  $F_2$,  MCP
implements temporal synchronization by  delivering  them  in  the
same  order  at  all  the  participants. Because message sequence
numbers capture the necessary temporal order,  implementation  of
temporal   synchronization  is  straightforward.  However,  if  a
preceding message $M_1$ is   lost  or  arrives   out-of-sequence,
MCP  session  layer at the destination must buffer the subsequent
message $M_2$ and decide when to deliver it to  the  application.
Because  each  flow  QOS specification specifies the delay bounds
for individual  messages  and  the  packet  rate,  MCP  uses  the
information to derive a delivery deadline for each message. Thus,
$M_2$  is buffered only until its deadline and  is  delivered  to
the  application  (with an exception reported)  if $M_1$ fails to
arrive  by the deadline.

subsubsectionToken Management

For each participant of a flow, MCP allocates a separate MCP  PCB
that  maintains   the  state  information about the token for the
participant. MCP enforces token semantics in the following way:


 When a sender tries to send a message across a flow  using  MCP,
the  MCP  session  layer transmits the message only if the sender
PCB holds a token. Otherwise, MCP  returns  an  error  indicating
that  the  token is not present and it is left to the application
to explicitly request transfer of a token from the current  token
holder.


 When a token request is made, the MCP session  layer  broadcasts
the  request  across  the  multipoint  flow.  When the request is
received  by  MCP  layer  at  a  token  holder,  it   immediately
acknowledges  receipt  of  a  request  so that the sender may not
repeatedly broadcast its request. MCP layer at the  token  holder
then   notifies  its  application  using  an  asynchronous  event
notification   mechanism    that   identifies   the    requesting
participant.

 MCP does not implement any particular  token  management  policy
and  it  is left to the application to decide whether and when to
transfer the token to the requesting entity.


 Token replication  requests  are  handled  locally  by  updating
appropriate  fields in the MCP PCB for the participant.  MCP also
implements  other  token  management  calls  such  as  copy_token
delete_token and  distribute_token.


subsubsection$Delta$-Causal Synchronization

When multiple  senders  are  involved  in  a  conversation,   MCP
session  layer  implements  a  limited notion of causality called
$Delta$-causality.


Whenever a conversation participant $A$ sends  a  message  $M_1$,
MCP  session  layer  at $A$ includes some  context information in
the message header.  The context information included  with  each
message   is   a   pair   verb+<sender_id,   mesg_id>+  for  each
participant (or sender) in the conversation. That is, the context
includes  the sequence number of the last message received at $A$
from each participant before sending $M_1$ (see  Figure~mcp-buff-
mgmt).

figure[tbp]

captionAt the top, figure shows the context information  included
by A in its message number 18. Context information in the message
specifies that the message should be delivered ONLY after message
number  17  (from  A),  message  number  2 (from C),  and message
number 12 (from D) have been  delivered.   However,  the  context
information  at  B  indicates that B has received so far first 15
messages from A, first 2 messages from C and  first  10  messages
from  D.   Therefore, message 18 from A is queued up and added to
the dependency lists.

mcp-buff-mgmt figure

Thus, when $M_1$ arrives at  another  participant  $B$,  the  MCP
session layer does the following:


 MCP at B checks to see whether or not it  has  already  received
all  the messages specified in the context of $M_1$.  If yes, MCP
schedules $M_1$ for delivery.

 If not, MCP can build the dependency list of messages for $M_1$.
Figure~mcp-buff-mgmt shows an example.

 Pending reception of  previous  messages  in  the  context,  MCP
computes  the delivery deadline of $M_1$ using the $Delta$ value,
sets a timer to go off at the deadline, and buffers $M_1$.

$M_1$ is buffered in a message dependency list according  to  its
position with respect to pending messages.

 Whenever another message $M_2$ arrives  over  the  conversation,
MCP  checks  the head of the dependency list to see whether there
are any messages waiting for the arrival of $M_2$. If  so,  those
packets  are  dequeued  and are scheduled for delivery along with
$M_2$ according to their delivery deadlines.

If message $M_2$ itself contains some unresolved dependencies, it
is also queued up (see Figure ~mcp-advance-buff).

 When a timer goes off for a message $M$, MCP  dequeues  $M$  and
any  messages that were queued behind $M$ according to dependency
information and queues them up for delivery  to  the  application
layer.


figure[t] centerlinepsfigfigure=mcpadvbuff.ps captionFigure shows
MCP actions when another message (seq. no. 4) arrives from C. The
new message depends on the messages  18  and  12  from  A  and  D
respectively.  mcp-advance-buff figure

subsubsectionAsynchronous Event Notification asynch-sect

One of the interesting MCP  implementation  issues  concerned  an
user   interface  concern.  A  multimedia  application  involving
multiple data streams must simultaneously handle  several  events
including  arrival  of  traffic  over individual flows, mouse and
keyboard events, and asynchronous events  such as  token  arrival
or  a  token transfer request. Conventional approach to  building
such applications in Unix is to use the  select system call which
is clumsy at best.

It is our belief that system designers  must   allow  an  upcall-
based  user  interface  to  allow  notification  of asynchronous,
event-triggered actions and MCP assumes such an interface.  Under
such an interface, an MCP application can specify an action to be
taken when a particular  event  occurs.   For  example,  the  MCP
service    primitive   verb+flow_status(<status   change   type>,
<procname>)+ specifies the upcall procedure to be called when one
of the pre-defined status changes or events occurs on a flow. The
arguments to the  verb+procname+   include  flow  identifier,  an
integer  code  specifying  the kind of change, and optionally the
user data received.  However, Unix does not support  upcalls.  We
are   currently  investigating  alternate  ways  of  implementing
upcalls in Unix.  Currently, such a facility is implemented in  a
MCP  library routine. The library routine in conjunction with MCP
kernel  simulates  an  upcall  as   follows.   MCP   reports   an
asynchronous  event  by  sending  a signal ( SIG_URG) to the user
process and setting the  outofband flag in the socket  structure.
The  upcall_handler library routine first uses getsockopt call to
retrieve the cause of the event and  any  additional  information
from  the kernel, and then calls the corresponding upcall routine
that is previously registered by the application.

Experience with MCP experience The experimental set  up  for  MCP
prototype  consists  of  four SUN Sparc ELC workstations equipped
with audio devices on an  ethernet  in  our  Distributed  Systems
Laboratory.  The  laboratory  is  mainly  used  for  experimental
research and consists of a dozen HP  and  Sun  workstations  with
three Network file servers.

We have implemented many multi-party,  interactive   applications
to  test/debug  MCP  kernel, to evaluate the effectiveness of MCP
abstractions,  and   to   measure   performance   of   multimedia
applications  with  and  without using MCP.  In the following, we
describe our experience using MCP and Section~perf discusses  the
results of our performance evaluation.

Simple Tests

To help us test and debug various parts of MCP implementation, we
wrote two simple applications, namely,  mshell and  mtalk. Mshell
is an interactive shell that  provides  command-level  access  to
various  MCP  primitives  such  as  flow  creation,  conversation
creation,   joining/leaving    a    conversation,    and    token
request/transfer.  Thus,  Mshell  was  useful in testing each MCP
service primitive separately and  in  various  combinations  with
other primitives.

Mtalk (for Multi-user talk) is a multi-party conferencing program
that  allows  a conversation among multiple users[coip-k. At each
user, the program divides the screen into  two  parts:  an  input
region  and  display  region. Lines of text typed in input region
are transmitted over a MCP flow to all the other users  and  text
from  other  users  is  displayed  in  the  display region of the
screen.  In particular, we used mtalk to test our  implementation
of multi-point flows using IP multicast.

Nevot

NEVOT (NEtwork VOice Terminal) is a  network voice terminal  that
supports  multiple  concurrent  conferences  (both  two-party and
multi-party) on top of a variety of transport protocols and using
a  range  of  audio  encodings   from vocoder to multi-channel CD
quality. It is developed  as  an  experimental  tool  by  Henning
Schulzrinne  of Univ. of Massachusetts and   has been implemented
on top of TCP, UDP and ST-II.

We have ported Nevot to run on top of MCP  and used Nevot/MCP  in
two  ways.  First,  Nevot  allowed  us to gain some experience in
using MCP token management primitives in  implementing  different
conference control styles. Second, we evaluated the cost of using
MCP  by  comparing  the  performance   of    audio   conferencing
applications  implemented  on  top  of  Nevot/MCP, Nevot/UDP, and
Nevot/MCP.

Using Nevot, we  built  an  audio  conferencing  application  and
experimented with the following conference control styles:

description
 [ Floor Control:] This is a user driven conference control style
used  commonly in teleconferencing applications. Under this model
at most one speaker holds ``floor'' at any time and is allowed to
talk  at  any  time.  Other   users must request control of floor
before speaking. MCP supports such a  user-driven  control  using
tokens.  For  each  flow,  there  is  only  one  token  allocated
initially and  the  applications  can  transfer  token  from  one
speaker  to  another  based  on  a  policy  that is chosen by the
application. In addition,  MCP  allows  an  application  to  make
multiple  copies  of  a  token  and  distribute  copies among few
participants.   In  Nevot,  for  instance,  we  can  implement  a
hierarchical  conference control where a subgroup of participants
have access to the floor at any time and can  separately  control
arbitration of speaking rights among the members of the subgroup.


Our overall experience so far has been that token  management  in
MCP  works  very  well  in  implementing a range of floor control
policies.  However, users must be made aware of token  management
because a speaker must typically be told to wait until a token is
requested and obtained from another speaker.

 [ Activity  Sensing:]  Floor-control  based  conference  control
using  explicit  transfer  of  tokens  is  not  suitable for many
collaborative multimedia applications involving  shared  windows,
image  browsers,  and group text editors. In such an environment,
strict control is desired to ensure that  at  most  one  user  is
allowed  to  manipulate   parts  of  a shared workspace. However,
transparent  transfer  of  transmission  rights   is    desirable
whenever  a  user  finishes  her update (idle for some time),  or
current sender of data  moves mouse (focus) away  from  a  shared
window.

An example of transparent floor control is a  distributed,  voice
activated,  collision  sensing algorithm proposed in[jj-86paper].
Under an activity  sensing  algorithm,  the  underlying  software
transfers  token  to  another  requestor when it notices that the
token holder has been idle  for  some  time.   However,  such  an
algorithm  requires  choosing  a  correct time interval to decide
whether or not a  speaker  is  in  idle  (quiescent)  state.  Our
experience   showed   that   an  idle  interval  of  200  to  400
milliseconds gives good results in terms of detecting quiescence.

The algorithm works well only in a local area environment and may
not  work  so  well when propagation delays are large over a wide
area network (WAN). We are now trying out several  variations  on
activity sensing algorithms.

 [ Brain Storming:]  An extreme alternative to strict floor-based
control  is  free for all in which no explicit conference control
is exercised and all participants are free to  transmit  whenever
they  wish.  We have found this to be a useful and efficient form
of conferncing especially when a conference enters  a  discussion
phase where there is no obvious speaker at any time.

description


Tests involving $Delta$-causality

To test $Delta$-causality,  we  wrote  another  application  that
created  a MCP conversation consisting of three multipoint flows:
a background flow, $F_file$ that repeatedly transferred  a  large
file,  a  background  audio  flow $F_au$ that continuously played
back file  containing  pre-recorded  music,  and  an  interactive
mtalk   flow   in   the   foreground  involving  at  least  three
participants. The tests involved simulating variable  delays  and
random packet loss inside the kernel. In the experiments, about 1
packets were chosen at random for  drop  and,  in  addition,  the
end-to-end  delay  was  varied  between  20  milliseconds  to 500
milliseconds.  The  parameter  $Delta$  was  chosen  to  be   200
milliseconds  which  happens to be an acceptable delay for packet
voice before delays cause  noticeable breaks in audio reception.

The tests served two  purposes.  First,  they  helped  us  verify
correctness   of   our  implementation  of  causal  and  temporal
synchronization. For instance,  MCP kernel estimates the  maximum
buffering necessary to buffer packets delayed until their $Delta$
deadlines and also computes the rate of packet delivery based  on
QOS  specifications.   Tests  allowed  us  to  verify that buffer
estimates were sufficient and packet rates could be maintained in
a  non real-time operating system such as Unix. Second, we wanted
to see how does enforcement of $Delta$-causality affect the audio
playback  at  participants  in  the  presence  of background file
transfer and bursty interactive traffic.  Our experience is  that
MCP  can  easily  maintain the deadlines associated with $Delta$-
causality in the presence of occasional packet loss. Packets  are
lost  either  due  to  random  drop  (network  errors) or because
packets that arrive beyond the delivery deadline  are  discarded.
Assuming resource reservation at the network layer, we expect the
packet loss and delays in future high-speed  Internet  to  be  of
similar  nature.   Tests  also  showed  that  loss  and delays in
delivery of  audio  packets  did  not  cause  easily  perceptible
degradation  in  playback  quality.  However, more systematic and
quantitative measures are necessary to evaluate  impact  of  such
losses.


Performance Evaluation perf

Goals of our performance  evaluation  were  two-fold.  First,  we
wanted  to  determine  the  impact  of  using MCP on a multimedia
application.  Specifically, we were interested in  comparing  the
performance  of  an application implemented using MCP against the
performance of  the  same  application  implemented  using  other
transport  protocols  such  as  UDP and TCP. Second, we wanted to
compare the performance  of  MCP  kernel  implementation  against
implementation of  MCP as a run-time library.

In the following, we describe  the  results  of  our  performance
evaluation.    Whenever    end-to-end   measured  values  for  an
operation were too small for clock resolution,  we  repeated  the
operation several times and computed the average value.


                TCP    & UDP    & COIP-RAW & MCP-kernel
Loopback (ms)  1.68    & 1.28   &  1.26    &  .96
Ether (ms)     1.80    & 1.47   &  1.36    &  1.07

caption  The  columns  show the  cost of transferring a token from
one participant to another
located across the ethernet or on the  same  machine  (loopback).
The  values  represent  an  average  computed  across  100  token
transfers and experiment was repeated  several  times  to  verify
that the results are consistent. nevot-token-transfer table

TCP & UDP  &  COIP-RAW  &  MCP
1.9   & 1.9   &  1.9 &  46

The  columns
show   the   cost   of  replicating   a  token  in  microseconds.
Replicating a  token only involves updating a field  in  MCP  PCB
and  no  communication.  Cost of token replication is higher with
MCP kernel implementation because it involves a  system  call  as
compared  to simply updating a variable in MCP library in case of
first three implementations.

nevot-token-replicate table


         TCP  & TCP (NO_DELAY) &  UDP     &  COIP-RAW & MCP-kernel
usertime 45.3 & 45             & 41.8     &  35       &  35/45
system   29.6 & 41               & 33.6   & 34.3      & 35/43

caption  Statistics at the Nevot sender side  for  20000  packets
sent  at  the rate of 50 packets per second. Each packet contains
an audio sample worth 164 bytes. The  second  (larger)  time  for
nevot  on  the  MCP kernel take into account the cost of activity
sensing.

nevot-sender table

        TCP    & TCP(NO_DELAY)   &  UDP     &  COIP-RAW  & MCP-kernel
usertime 38.2  & 38.5            &  37.9    & 38.7       & 38.8
system   29.2  & 40.8            & 29.9     & 33         & 33.35
latepkt  21-11 & 3               &  3       &  5         &  5

caption   Statistics  at the Nevot receiver side. At the
receiver side, row labeled  late  packets  shows  the  number  of
packets that arrive too late for playout.

nevot-receiver table

Comparison with Other Protocols

We have implemented MCP services as part of a run-time library on
top  of  three different protocols: TCP, UDP, and COIP-RAW. COIP-
RAW provides direct access to COIP network interface through Unix
sockets.   We  ported  Nevot  to  run  on each of the MCP library
implementations and on top of MCP kernel.

Tables~nevot-token-transfer  through  nevot-receiver   show   the
results   involving   token   operations   and   use   of  Nevot.
Tables~nevot-sender and nevot-receiver show results  of  a  Nevot
experiment  involving a sender and a receiver. Sender transmitted
contents of an audio file  (audio  playback)  in  a  loop.  Nevot
prints  statistics  collected  using  Unix   getrusage call after
every 20000 packets and statistics  include  application's   user
and  system  times.  Packets  arriving  late  at the receiver are
discarded.  MCP kernel performance is comparable  to  that  using
other  protocols.  On the receiving side, Nevot/TCP loses packets
due to late arrivals even across an ethernet because  TCP  queues
up  data  at  sending  side to send in chunks. Using  TCP_NODELAY
option  prevents TCP from queuing up data and  that  reduces  the
number  of  late  packet  arrivals, but  the total times are then
much worse as shown in Tables.


Kernel vs. Library Implementation


Type of Connection  &  Implementation level  

                    & MCP Library & Kernel
Point to Point      &  35.6       & 6.7
Multipoint          &  57.6       &   4.5

caption   The  Conversation  create time (in  milliseconds) shows
the amount of time needed to establish a single-flow conversation
involving   two   participants  in  each  case.   Establishing  a
conversation consists of creating a control flow and updating MCP
state   to   add   the  flow  to  a  conversation.  Typically,  a
conversation consists of more than one  flow,  but  the  cost  of
creating a multi-flow conversation is similar to that of creating
a single-flow conversation.

conv-create table


Type  of  Connection   &    Implementation  Method

              &  MCP Library & MCP Library & Kernel & (COIP-RAW)  & (TCP)
Point to Point&  257         &  400        & 78
Multipoint    &  441  & --   &   hline

caption
Data Transfer time for 100  packets each containing one  byte  of
information.  All the timings are in milliseconds. giri-data-xfer
table


To evaluate  the impact of implementing MCP in a  kernel, we also
implemented  MCP  kernel as a run-time library on top of COIP-RAW
interface.  A  multimedia  application  using  MCP   library   is
implemented  as  a multi-process application  where one thread is
responsible for handling   data  transfer  across  a  flow  in  a
conversation. MCP state common among threads is maintained  using
shared memory.

Thus, creating a conversation requires starting a new process  to
manage   the   control   flow   and   communication  with  remote
participants to establish a  conversation.  Implementing  library
routines  through  multiple  threads of an application simplifies
programming and makes it easy to hide  the  details  of  handling
data  transfer  and  asynchronous  events  over individual flows.
However,  as Tables~conv-create and giri-data-xfer shows, cost of
operations  such  as creating a conversation and data transfer is
significantly higher using the library as compared  to the kernel
implementation.

Related Work
related

The work described in this paper is related to  current  research
in   two   main  areas:   user  interface  for  coordination  and
conference control and OS and protocol support for enforcement of
QOS and temporal/causal synchronization.

In  the  area   of    conference   control   and   collaboration,
CECED[ceced]  system  is  a  notable example of a software system
that supports collaboration involving multiple users with minimal
intrusion of software and user styles. For coordination and floor
control, CECED uses an activity sensing  algorithm  as  described
earlier.  We  have  used  the  algorithm  as  a basis for further
experimentation with alternate activity sensing policies.

Schulzerrinne[henning-ietf-draft]   discusses   several    issues
involved  in  designing  a  real-time transport protocol (RTP) to
support audio conferencing across the Internet. RTP's main  focus
on   achieving   playout   synchronization.  MCP  design[mcp-dcs]
precedes RTP design and differs from RTP  because  MCP  considers
causal    synchronization    in    addition   to   the   temporal
synchronization.

Recent work at  Tenet  group  in  Berkeley[tenet-report-mim]  has
identified   some   fundamental   requirements   of    multimedia
collaborative applications  including conference management,  QOS
mechanisms,  support for $N X M$ (N senders and M receivers) data
flows, and so on. MCP provides functionality to meet  several  of
those  requirements  and  although  MCP  currently  supports only
symmetric $N X N$ flows, it can easily be modified to support  $N
X  M$ flows.


MMCC,   Multimedia  Conference  Control  system,   developed   at
USC/ISI[mmcc-joi]  concentrates  only  on providing session level
conference management functions such as conference establishment,
inviting  parties to join a conversation, merging conferences and
starting  a  ``side-conference''  through  an  application  level
connection  manager.   MCP  includes most of such support and, in
addition, provides communication mechanisms  to  enforce  several
varieties  of  conference  control and to achieve synchronization
among several flows.


A system developed by Angebranndt and others[louds]  shares  some
of the goals of MCP. Their system contains an audio protocol that
supports  audio connections and virtual devices.  Synchronization
in  that system refers to synchronizing beginning of playout   at
several audio devices and achieving  lip-syncing.  MCP  does  not
contain  mechanisms  to  synchronize  playback  starts at several
sites,  but  provides  a  general  purpose  temporal  and  causal
synchronization  facility  irrespective  of whether media sources
are of fixed rate or totally asynchronous variety.  Another  nice
aspect   of   Angebranndt's  system  is  that  their  programming
interface supports  a  set of command queues  (or  scripts)  that
allow specification of synchronization among  audio devices.  The
system also reports several types of   events   including  device
operation  completions, arrival of synchronization points, and so
on.   MCP,  on  the  other  hand,  only  considers  a  subset  of
communication   events  related to  token passing  and changes in
membership of a flow or a conversation.

Summary concl

We have described design and implementation  of MCP  session  and
transport  layer  protocols that provide communication mechanisms
and  abstractions  useful  for  building  distributed  multimedia
applications.  We have implemented MCP in SUN OS 4.1.1 kernel and
evaluated its performance and suitability of its  mechanisms  for
building multimedia applications.

Our work is significant in following  respects.   First,  to  the
best  of  our  knowledge,  this  is  one  of the first Unix-based
implementation of transport and session level  services  designed
explicitly   to   meet   the   coordination  and  temporal/causal
synchronization needs of multimedia applications. Second, we have
shown  that  coordination  and  synchronization mechanisms can be
efficiently implemented in a general purpose  OS  such  as  Unix.
Third,  we  have  identified  alternate conference control styles
that should be supported to meet a  wide  range  of  applications
including   audio   conferencing,   group   editors,  and  shared
workspace.  We plan  to  continue  further  research  in  various
conference  control  styles and user interfaces for collaboration
using multiple media.

During our work, we have also discovered some of the  limitations
of  the  process  scheduling and event notification mechanisms in
Unix-like operating systems. For instance, Unix lacks  an  upcall
mechanism  and  it is not easy to promptly notify a process of an
asynchronous event such as arrival of a token or a token request.
We  have  identified  some possible extensions to Unix scheduling
mechanism to overcome such a problem and plan to  pursue  further
research in that direction.

Acknowledgments

Authors would like to thank Charles Kranor and Guru  Parulkar  of
Washington  University  in  St.~Louis  for  making COIP-K sources
available to us and for cheerfully putting up with our  questions
about   COIP-k   software   during  the  initial  stages  of  MCP
implementation.


[j-86paper] L.Aguilar,  J.J  Garcia-Luna-Aceves,  D.Morgan,  E.J.
Craighill, and
  R.Brungardt.       Architecture   for   a    multimedia
teleconfrencing  systsm.   In  Proceedings of ACM SIGCOMM
'86 Symposium, 1986.

[clouds] Susan Angebranndt, RichardL.  hyde,  DaphneHuetu  Luong,
and  Nagendra Siravara.   Integrating audio and telephony
in a distributed workstion
  environment.   In  USENIX '91  Conference  Proceedings,
pages 419--435. USENIX,
  June 1991.

[ceced] E.Craighill, R.Lang, K.Skinner, M.Fong, and J.J.  Garcia-
Luna-Aceves.     CECED: A Ssystem for Informal Multimedia
Collaboration.   unpublished report,  SRI  International,
CA, January 1993.

[coip-k]  C.Cranor.     An   Implementation   Model   for
Connection-Oriented Internet
  Protocols.   Technical report, Department  of  Computer
Science, Washington
  University, May 1992.   MS Thesis.

[deering-multicast] S.E. Deering.   Multicast Routing  in
Internetworks and Extended LANs.   In  Proceedings of ACM
SIGCOMM '88 Symposium, August 1988.

[suite] Prasun Dewan.   A Guide to  Suite:  Version  1.0.
  Technical  Report  SERC-TR-60-P,  Software  Engineering
Research Center,
  Purdue University, February 1990.

[ellis-gibbs-89]  C.Ellis,  S.Gibbs,  and  G.L.  Rein.   
Design  and  Use  of  a  Group  Editor.    In  IFIP WG2.7
Working Conference on Enginerring for Human
  Computer Interaction. North Holland, August 1989.

[ferrari-verma] D.Ferrari and D.Verma.     A  scheme  for
real-time  channel establishment in wide-area networks.  
IEEE Journal on Selected Areas in Communications, 8(3), April
  1990.

[lamport-timeout] Leslie Lamport.   Using Time Instead of
Timeouts.      ACM  Transactions on Programming Languages
and Systems,
  6(2):254--280, April 1984.

[lantz-cscw86]  KeithA.  Lantz.      An   Experiment   in
Integrated  Multimedia Conferencing.   In  Proceedings of
Conference on Computer-Supported Cooperative
  Work, pages 267--275, December 1986.

[4.3-bsd-book] SamuelJ Leffler, MarshallKirk  McKusick,  MichaelJ
Karels, and JohnS
  Quatarman.    The Design and Implementation of the  4.3
BSD UNIX Operating
  System.   Addsion Wesly, 1989.

[rfc1128] DaveL. Mills.    Measured  performance  of  the
network time protocol in the internet
  system.   Network Working Group Request  for  Comments:
1128, October 1989.

[stii-partridge] Craig Partridge and Stephen Pink.     An
Implementation of the Revised Internet Stream Protocol
  (st-2).     Journal  of  Internetworking  Research  and
Experience, March 1992.

[henning-ietf-draft] H.Schulzrinne.   Issues in designing
a transport protocol for audio and video
  conferences and other multiparticipant real-time  applications.
   INTERNET-DRAFT,  IETF  Audio-Video  Transport  Working
Group, December
  1992.

[nevot] H.Schulzrinne.   Voice Communication  Across  the
Internet: A Network Voice
  Terminal.   Technical report,  Department  of  Computer
Science, University of
  Massachusetts, Amherst, MA, August 1992.    unpublished
report.

[mmcc-joi]  EveM.  Schooler.     Case  study:  Multimedia
conference control in a packet-switched
  teleconferncing system.    Journal of  Internetworking:
Research and Experience (1993),
  1993.

[colab-from-xerox] Mark  Stefik,  Gregg  Foster,  Daniel  Bobrow,
Kenneth Kahn, Stuan Lanning, and Lucy
  Suchmann.   Beyond the Chalkboard: Computer Support for
Collaboration and
  Problem Solving in Meetings.    Communications  of  the
ACM, 30(1):32--47, January 1987.

[tenet-report-mim]  Clemens   Szyperski   and   Giorgio   Ventre.
 A characterization of multi-party interactive multimedia
  applications.     Technical  report,  The  Tenet  Group
International Computer Science
  Institute and Computer Science Division UC  Berkeley,  February
1993.

[swinehart-system-support] DanielC. Swinehart.     System
Support  Requirements  for Multi-media Workstations.   In
Proccedings of the SpeechTech '88 Conference, pages 82--83,
  New York, April 1988. Media Dimensions, Inc.

[rajthesis] R.S. Yavatkar.    An Architecture  for  High-
Speed Packet Switched
  Networks.   PhD thesis, Purdue University, August 1989.
 Also available as TR-898.

[mcp-dcs]  Raj  Yavatkar.       MCP:   A   Protocol   for
Coordination and Temporal
  Synchronization  in  Multi-media  Collaborative   Applications.
 In  Proceedings of the 12th International Conference on
  Distributed Computing Systems. IEEE, June 1992.

[rgp-paper] R.Yavatkar and V.Dumasia.    Reliagram  --  A
Communication Abstraction for Distributed
  Processing.   In  Proceedings of the IEEE Symposium  on
Parallel and Distributed
  Processing, December 1991.

[rsvp] L.Zhang, S.Deering, D.Estrin, S.Schenker,  and  D.Zappala.
  RSVP:  A  New  Resource ReSerVation Protocol.  
submitted to ACM SIGCOMM '93, March 1993.


document