LISA '06 Paper

Experience Implementing an IP Address Closure

Ning Wu and Alva Couch - Computer Science Department, Tufts University
Pp. 119-130 of the Proceedings of LISA '06: 20th Large Installation System Administration Conference
(Washington, DC: USENIX Association, December 3-8, 2006).

Abstract

Most autonomic systems require large amounts of human labor and configuration before they become autonomous. We study the management problem for autonomic systems, and consider the actions needed before a system becomes self-managing, as well as the tasks a system administrator must still perform to keep so-called ``self-managing systems'' operating properly. To understand the problem, we implemented a prototype self-managing ``IP address closure'' that implements integrated DNS and DHCP. We conclude that the system administrator is far from obsolete, but that the administrator of the future will have a different skill set than those of the present, focused around effective interaction with closures rather than management of individual machines.

Introduction

Imagine that you are asked to set up a new DHCP/DNS infrastructure. You proceed to collect a pocket-full of ``Ethernet keys'' that look like USB keys, but each contains a micro-controller and an Ethernet interface, where power is drawn from the Ethernet plug. You proceed to plug one of these keys into a test network and give it a specification of your network architecture in the form of an operating policy. Then you plug in the other keys to the same network, and each copies the policy from the first key. Finally, you unplug some of the keys and plug one or more keys into each Ethernet subnet and voila, you have a self-managing IP address infrastructure that is self-healing, and in which telling any key about a policy change causes that change to propagate to the whole infrastructure of keys. If a key dies, you replace it with another key that has - for awhile - been plugged into any subnet containing a working key. No backups are necessary; the infrastructure is completely self-managing and self-healing.

Are we dreaming? Not really, as this paper will show. It is possible to implement such devices and infrastructure. But there is a larger question that we have not addressed: what happens when something goes wrong? The subtlety in managing such an infrastructure lies in the interface between the keys and the human administrator. When things go wrong, a human is still required to intervene and repair problems.

Closures

Our implementation of the above-mentioned behavior is based upon the theory of closures. A closure [3] is a self-managing component of an infrastructure that protects one part of IT infrastructure while making its needs known to other closures [20]. The closure model of a managed system expresses the system as a composition of communicating components. In previous experiments on closures [20], it has been demonstrated that any high-level closure needs support from other low-level closures. For example, consider a web service. The service itself can be encapsulated within a closure, but does not handle IP address assignment and DNS [15, 16]. These functions must be handled via one or more lower-level closures in a self-managing infrastructure.

In this paper, we describe experience and lessons learned in building and testing an ``IP address closure.'' This closure is a self-managing ``fabric'' of distributed nodes that handles DNS and DHCP for an infrastructure. The IP address closure handles address assignment based upon three inputs: requests for addresses, a policy on address assignment, and architectural information about routing and gateways within the network. The IP address closure sits between the web service closure and a routing closure (which may be implemented by a human being), accepting inputs from both (Figure 1).

Figure 1: Interaction between closures.

A New Management Style

Managing the IP address closure is very different than managing the Web service closure. The web service closure is managed via ``commands'' that change state of a single server or web farm. By contrast, the IP address closure fabric is composed of small, movable ``black boxes'' that can serve as DHCP and/or DNS servers. These are configured by a process of seeding. Each box is initialized by physically plugging it into the same subnet as an already seeded box, by moving it physically. The new box discovers the existing seeded box, clones its configuration, and receives an idea of network topology, policy, and locations of peers from the existing box. After this, it is moved to its final physical location, after which it can serve to seed other boxes.

Simple Hardware Components

An element of the IP address closure is an extremely simple device with an Ethernet connection and some form of persistent storage. It is conceivable that a closure node could be implemented in hardware using only non-moving parts such as flash and regular memory (no hard disk would be required), thus leading to extremely low hardware cost for the self-managing fabric. It is even possible to power a node from the Ethernet connection, so that it can be a completely self-contained device similar to a USB key (an ``Ethernet key''). We foresee a time in which IP management could literally be accomplished by a pocket full of keyring-sized devices, carried from room to room as needed. A similar approach, using the same environmental discovery and arbitration algorithms, could be used to create closures for other tasks such as distributed monitoring, intrusion detection, troubleshooting, web caching, file system caching, and secure remote access.

Backup and Recovery

This is a dramatic difference in how one provides failover and recovery in the IP address closure fabric, compared to managing current DNS and DHCP servers. To establish redundancy on a subnet, one simply plugs another box into the subnet, and the new box makes itself a clone of the boxes it discovers, to become a backup server. If one unplugs a box, any backup servers automatically start serving requests. If a box fails, one simply unplugs it and plugs in another. The boxes serve as their own backups; any box is interchangeable with any other in case of failures. Each box discovers where it is operating, and how many neighbors it has, before deciding to provide services or serve as a backup server. Thus backups are as easy as keeping a spare box that one plugs into a subnet periodically in order to keep the backup node up to date, and recovery is a matter of plugging the backup node back into the network so that its changes can be propagated to other nodes.

Low-level Last

It may seem to the reader that we have gone about the problem of building closures ``backwards''; previous authors have studied ``high-level'' closures that - to operate properly - require low-level closures that we tend to implement after the closures that utilize them. The reason for this backward implementation order is that many of the challenges in building a closure come into play at the lowest levels, where the interface between the system administrator and the closure is most complex. At the lowest level, closures are limited by the fact that software cannot accomplish physical changes in hardware or network configuration. When configuring a web server [20], this is not much of a concern, but at the IP level, it is a central issue.

Related Systems

In large-scale systems, manual procedures for maintaining static and dynamic IP address assignment are both tedious and error-prone. IP management tools have been developed to help administrators manage the IP space in an enterprise; CISCO Network Registrar [2], INS IPControl [6], and Lucent VitalQIP [13] are examples of current products. Common features of IP management software include integrated DHCP and DNS service, centralized policy management, and failover mechanisms for high availability. These products require crafting of detailed IP assignment policies, as well as manual configuration of all nodes included in the service. Melcher and Mitchell [14] mention the need for an autonomic solution for DHCP, DNS, LDAP, and other services. It is also highly desirable to minimize the amount of human input necessary to configure the system, avoiding the ``incidental complexity'' of making policy decisions that have no externally observable behavioral consequences [3].

Goals

Our goals in creating the IP address closure were to help administrators by:

Encapsulating a reusable design of the IP assignment plan in a policy.
Reducing incidental complexity by automating unimportant decision-making.
Automating the process of implementing changes in policy.
Providing autonomic features such as self-configuration, self-backup, and self-healing.
Simplifying day-to-day management of the IP address (DHCP/DNS) infrastructure.

The IP address closure can be seen as an effort to implement autonomic features [4, 5, 10] in the IP layer.

Paper Organization

In this paper, we will use the IP address closure as an example of the potential impact of autonomic systems upon system administrators, and show that system administrators can benefit from it and similar systems. Far from threatening the jobs of system administrators, the IP address closure is instead a ``partner'' that requires ongoing management, in return for offloading some common management tasks.

This paper is organized as follows. We begin by describing the overall design and function of the IP address closure. We then discuss the design and implementation details for the IP address closure and critique our prototype. We subsequently discuss the relationship between autonomic systems and system administration and then discuss the issue of exception handling. Finally, we conclude this paper and discuss future work.

Closure Design

The design of our IP address closure is so unlike that of any prior work that a detailed discussion of its theory of operation is necessary. In this section, we give a detailed theory of operation intended to convince the reader that the closure will work as described. The closure's theory of operation is somewhat subtle, and this section can be skipped without loss of continuity if the reader is not interested in implementation details.

Peer-Peer Architecture

Unlike prior closures, which resided primarily on one machine, the IP address closure resides within a peer-peer ``fabric'' of distributed ``black boxes'' that manage the state of the IP layer for an enterprise. These ``Peered IP'' management nodes, or ``PIPs,'' manage themselves based upon a high-level policy and environmental factors that the PIPs discover through direct probing of their environments. PIPs can be implemented as small and cheap ``Ethernet appliances'' that support each other and implement both self-healing and self-replication features.

A peer-to-peer solution is more robust and easier to use; there is no need to manage a centralized database. The distributed nodes have a better view of the environment than a central probe; they can see through firewalls and other protections, and can acquire environmental information [12] that is more accurate than relying upon human input. If we tell one peer about a new policy, it distributes the policy to all of its known peers, which continue relaying the policy until it is present on all nodes. However, control of information distribution is more difficult than in the centralized case. For example, at any particular point in time, there can be conflicts between the policy information in replicas. A policy change must be broadcast to all the PIPs.

It would have been nice if we could have utilized an existing peer-peer scheme for implementing our closure. The drawback of utilizing existing peer-peer schemes is that their own bootstrap protocols require prior existence of a stable IP layer. Also, their complexity and goal of distributing large amounts of information is much more ambitious than we need. We utilize a simple pull-only gossiping protocol to communicate relatively brief policy information among PIPs, after a (rather complex) bootstrap and environment discovery protocol that is necessary because there is no IP layer before the closure becomes functional.

In practice, using a complete peer-to-peer environment poses problems in network design. If a network policy chooses to maintain separate subnets, it may still allow the DHCP servers to talk to a central server, and vice versa. If administrators choose to use a peer-to-peer architecture, the complexity of firewall rules will be increased. The deployment of web services [23] faces similar issues. If lower-level network closures exist, the requirement of configuring firewalls can be delegated to it; if not, administrators must manually configure the firewalls.

Bootstrapping the Closure

The most innovative feature of our IP address closure is how it bootstraps itself into a functional state on a previously unmanaged network. Unlike other closures and autonomic computing solutions, our closure must be able to cope with a network where the IP layer is not yet functional. This leads to a rather unique process for bootstrapping, based upon policy seeding and environmental discovery. There are three types of hosts in an IP address closure: regular host, PIP, and delta box(Figure 3). Regular hosts are the clients of the DHCP service provided by PIPs. PIPs are the management nodes that provide DHCP and DNS services. A delta box is a special type of PIP that potentially contains different information from other PIPs on the same subnet; otherwise, it is the same as a generic PIP. A delta box can be used to deliver information to another subnet by connecting it physically to that subnet. This feature is very useful for distributing policies to networks that are physically segregated from the rest of the infrastructure, e.g., by firewalls.

The bootstrapping process for the IP address closure is different from that for a normal system. A PIP, referred to as ``delta PIP,'' can be moved physically through the network to propagate information about changes. Bootstrapping of the closure is done by a combination of logical specifications and physical moves of devices. To bootstrap the closure, one starts with a policy file and loads it into a single delta PIP. Connecting this box to each segment of the network discovers other peers on that segment and communicates the policy to them. The delta box also records and distributes the addresses of other peers. At the end of the seeding process, every node knows about every peer, and the delta box is no longer needed. The bootstrapping process is depicted in Figure 2.

Figure 2: Bootstrapping a new subnet from a delta PIP.

Figure 3 shows an example of the deployment of PIPs. The IP network could contain many firewalls between routers, and the subnets can even be disconnected with no ability to communicate, provided that physical moves are utilized to propagate policy changes. Assume subnet one is separated from subnet two and subnet three. Subnet one has three PIPs deployed (two PIPs and one newly arrived delta box). Subnet two has only one PIP, so there is no failover backup on subnet two. Subnet three has two PIPs that will form a failover pair.

Figure 3: An example of how PIPs can be deployed.

The one command that every PIP must understand is `dump.' Once a PIP receives a dump request, it adds the requesting PIP into the known PIP list and dumps its knowledge base to the requester. It is the job of the requester to analyze the content and change its own configuration accordingly. Each PIP periodically probes other selected PIPs in the known PIP list. The PIPs probed are chosen according to the structure of the P2P network; one box per subnet is arbitrarily chosen. If a PIP cannot be contacted in a specified period of time, it is removed from the neighbor/known PIP list.

The freshness of information is controlled by versions consisting of time stamps. Each piece of information is stamped with the clock time of the change observer (which might not be a local peer).

Policy Change Planning

Each PIP records its own decisions in a low-level operational policy file. When another PIP appears on the same subnet, it might take over some tasks because of performance or other reasons, and mark the operational low-level policy accordingly. The functional states of PIPs in a particular subnet are managed by a ``boss'' PIP, whose identity is determined by a booting race condition. Only the ``boss'' of a subnet can change the low-level behavioral attributes related to that subnet. The ``boss'' effectively acts as a coordinator to prevent write conflicts.

We must assure that policy changes propagate to every PIP, and that there is a global convergence among the PIPs to one coherent overall policy. This means that all the active PIPs in the IP address closure either accept or reject a high-level policy together. Before a high-level policy is used, a policy proposal is published to the closure. Then the PIPs decide whether the proposal is feasible. Our research does not focus on how to quickly reach consensus in a distributed environment; we choose a simple two-phase protocol and leave the problem of optimizing this protocol to future work. The good news is that because the IP address closure is operating in a controlled environment, the complexity of the consensus problem is significantly reduced.

A high-level policy proposal is accepted only when all active PIPs vote `yes,' which indicates that all the preconditions in this policy that are related to a particular PIP are satisfied. It is possible that a change is rejected and a PIP votes `no.' This happens when the physical constraints of the policy are violated. For example, one fixed mapping in the policy file might be impossible because the host is physically on a different subnet than the policy expects.

Self-healing and Failover

While the self-healing features are implemented by redundancy, there are some special considerations for self-healing of DHCP and DNS servers. In particular, any failover should allow for:

Seamless management of leases granted by the failed peer.
Address spoofing of failed DNS servers during failover.

In our prototype, we handled the first condition but not the second; it is reserved for future work.

In order to provide failover DHCP service from one server to another server, IP leases must be cached somewhere else so they can be managed on a new server. One way to do this is to store the leases in a P2P infrastructure (for example, openDHT [18]). In this way, every IP assignment is recorded in the network, and transitioning from one server to another is easy, because the information is not stored in each individual server alone; replicas are stored in the P2P network. We chose to use the existing DHCP Failover protocol [17], implemented by ISC DHCP. This failover protocol meets most of our goals but has a constraint that it only supports failover in pairs. This constraint limits the number of backup servers to one at any given time. Redundant backup servers are on standby, awaiting future need.

Failures could be in hardware, network, software, etc. The goal of redundancy is to keep the DHCP and DNS service running whenever possible. If a server starts functioning again after a failure, it should be able to recover from the failure; if it fails permanently, the service should still be provided if possible. In the current failover protocol, if a failover pair cannot communicate with each other, they split and share the IP pool for new IP assignment until the communication recovers, because a PIP does not know whether the failure is due to network failure or node failure. If the network partitions and both primary and secondary DHCP servers are assigning IP addresses without splitting, there may be conflicts when a PIP rejoins the network after a long absence. Currently, we are satisfied with the solution of notifying system administrators when the failover mechanism is invoked. If human administrators determine that one of the servers has indeed failed, a backup server can be added to the subnet.

Bootstrapping a PIP

Each PIP acts as a primary DHCP server, secondary DHCP server, or backup DHCP server. A booting state diagram (Figure 4) shows the states of a PIP when it boots. The booted PIP can be in several states depending on the network environment. If it obtains an IP address from a DHCP server, it will enter the `cloning' state, in which policies are dynamically kept synchronized with the current segment server. If it does not receive a DHCP response and discovers its own IP number from its environment and policy, it will assume that it is alone on the network segment and go into active service as a DHCP server. Else, if a PIP cannot determine its IP address by any means, the boot process fails.

Figure 4: The booting state diagram.

During bootstrapping, a PIP must determine the segment into which it has been plugged. It first sends a DHCP request message on the segment, hoping a DHCP server will respond and assign it an IP address. If not, it probes the network and determines its location, and then assigns itself an IP address based upon that probe.

How does a PIP determine its own IP address if DHCP is not yet running and it is the first potential server? Ideally, we should be able to obtain this location information from lower-level closures - for example, through a broadcast-based protocol. Without such a luxury, we must probe for an IP address that we can use to exchange information with other nodes. We implemented the probe mode in our prototype. For example, we have the following definition for segment 192.168.3.0/24 in the seed file:

<seed>
   <segment>
     <network>192.168.3.0</network>
     <cidr>24</cidr>
     <router>192.168.3.1</router>
     <bootip>192.168.3.2</bootip>
   </segment> ...
</seed>

Each node actively probes to determine which segment in the list of possible segments is directly connected to it. The seed file contains a list of primary routers, with one unused IP address (called the bootstrap IP) for each segment. A PIP iterates through the segments and try to ARP the corresponding primary router. If it receives an ARP reply from the router within a specified period of time, then it concludes that it is connected to the corresponding subnet.

In using the probe protocol, a race condition can occur when two PIPs are bootstrapping on the same segment simultaneously. Then both PIPs try to use the same IP address. To avoid the race condition, each node sends an ARP request to resolve the bootstrap IP address. We refer this kind of ARP request as a claiming ARP, because the goal of this ARP request is to claim that a node is going to use the bootstrap IP address. If this IP address is already used, the node will receive an ARP reply from the bootstrap IP address, indicating that this address is already in use by another host. Then the booting node will simply abort the bootstrapping process.

If, after a period of time, no other claiming ARP request for the bootstrap IP address is received, the PIP will assign itself that address (we will call this state `committed'). If before this timeout (commit) event, more than one PIP is booted at the same time, each will receive the claiming ARP request at roughly the same time; the winner is determined based on MAC address. The PIP with a higher MAC address proceeds while PIPs with lower MAC addresses yield quietly. However, if one PIP has already committed its IP address, it will send a ARP reply claiming the IP/MAC mapping, as if it is already using that IP address. Any PIP, even though it may have a higher MAC address, will yield when it receives such an ARP reply, because that means the IP has been taken.

Before the IP address on the PIP is committed, the bootstrapping program is responsible for sending ARP responses so other nodes will yield. After the IP address is set, the ARP response will be generated by the OS. In this bootstrap protocol, the timeout period must be long enough to guarantee the ARP response is received if there is another host using the same IP address. Note that in this protocol, no incorrect ARP replies are sent to the network, so no ARP poisoning is caused by our protocol. Figure 5 shows a sequence diagram of three PIPs trying to boot at about the same time. A state diagram (Figure 6) shows the state transitions for this protocol.

Figure 5: The bootstrapping sequence diagram. PIP1 wins in the bootstrap competition.

Figure 6: The bootstrapping state diagram.

Seed File Distribution

To minimize the work of the system administrator, we designed a mechanism to help with distribution of the seed file. We achieve this goal via a seed PIP, which is a delta PIP, indicating that it moves between subnets to gather information information from PIPs in each one. The seed PIP first self-bootstraps, then provides DHCP service. When a second PIP is plugged into the network, it gets an IP address via DHCP from the seed PIP. Then it configures itself as a failover for the seed PIP. In turn, the seed PIP can be removed without affecting the service, and moved to another subnet, where the process repeats.

Once the seed files are copied, a seed PIP is no different from other PIPs. The administrator can unplug any of the PIPs on the current net and use it as a seed PIP on a different subnet. We intend for the PIP to eventually have a light weight and a small size, so it can be carried easily around to seed other PIPs, e.g., behind firewalls.

The State Transition Problem

A system is rarely static. During its lifecycle, humans will request many changes in system behavior. System administrators need to be able to move the system from one operational state to another. This is called the state transition problem.

Traditionally, humans have been in charge of state transitions. The human administrator manipulates each device (in some way) into a new state. When systems become self-managing, however, it is possible for the systems themselves to take an active role in changing states. The ideal situation occurs when the system being self-managed knows the best possible way to change state, so that it serves as a ``co-pilot'' or ``partner'' to the system administrator requesting the change.

In autonomic computing, several change planning systems have been developed - for example, CHAMP [9]. CHAMP is a task scheduling system that tries to optimize task scheduling based on cost and time constraints. CHAMP differs markedly from the IP address closure. It tries to solve the scheduling problem for changes so that downtime and disruption are minimized, and distributes the tasks for parallelism; and the calculation is centralized on a single master host. By contrast, the IP address closure does not compute change schedules centrally. Change schedules for the IP address closure can be computed locally, because the IP assignments for different subnets do not depend on one another.

Using the Closure

This section describes how one uses the closure. Low-level closures such as this one pose unique challenges for the system administrator. For the IP address closure to be functional, the system administrator must synthesize a description of its operating environment as well as its operating policies. There is an intimate relationship between contents of the closure configuration file and the routing architecture of the site. Thus the human administrator is in no sense obsolete; changes in the environment must be made known to the closure on a continuing basis.

Our IP address closure's input is a policy file describing the desired relationships between IP numbers, network names, MAC addresses, and subnets. For example, it specifies which subnets are present and their numbering ranges. Some of this information would ideally be determined by a lower-level routing closure, e.g., the addresses of subnet gateways; here we (temporarily) encode that information into a seed file instead.

When using the IP address closure, the only thing a system administrator must specify is the intended behavior of the IP space; one is relieved from managing superfluous and ``incidental complexity'' with no behavioral impact [3]. For example, the tedious task of insuring agreement between DHCP servers on the location of routing gateways is managed by the closures and the human administrator need not participate.

The Policy File

In the IP address closure, there are two levels of policy. The first is a high-level policy that defines the overall behavior of the closure and reflects the IP scheme of the whole organization. This is determined by the system administrator. The second is a low-level policy that describes the behavior of the running system and how actual configuration files are generated. This is determined by the closure itself. For example, the number of hosts allowed in a certain subnet is part of a high-level policy, whereas which host serves as primary and which serves as secondary failover server is part of a low-level policy. The high-level policy file contains the DHCP pools of available public IP addresses and private IP addresses, physical subnets, lease period, and some strategies about how the IP address closure is formed. These attributes define the behavior of the closure. The high-level policy specifies the goals of a bootstrap, while the low-level policy represents a steady operating state in which the bootstrap has been accomplished.

The high-level policy file reflects the IP scheme of the whole organization. Some part of the high-level policy may not be realized by a particular closure. Before a new version is released, it can be validated by several rules, including checks for consistency, IP overlapping, and syntax errors. After validation, a new policy will be broadcast to all the servers in the closure. The following code shows an example high-level policy file.

<policy ts="1136488200">
  <!-- static mapping from MAC to
                              IP address -->
  <include tag="mac-to-IP">
                      fixed-ip.xml</include>
 <!-- static mapping from MAC to
                               host name -->
   <include tag="mac-to-name">
   fixed-name.xml
   </include>
   <!-- will be maintained by a
                          router closure -->
   <topology>
      <!-- defines subnets connected
                    by DHCP relay agents -->
      <relayed-subnet id="department1">
         <subnet>192.168.1.0</subnet>
         <subnet>192.168.5.0</subnet>
       </relayed-subnet>
   </topology>
   <pools>
      <pool access="private">
         <from>192.168.0.0</from>
         <to>192.168.254.0</to>
         <cidr>24</cidr>
         <max-lease-time>51000
                      </max-lease-time>
         <subpool>
            <from>192.168.3.10</from>
            <to>192.168.3.254</to>
            <cidr>24</cidr>
            <max-lease-time>510000
                      </max-lease-time>
            <include tag="restriction">
                      res.xml</include>
          </subpool>
      </pool>
   </pools>
   <!-- Special rules (exceptions)
              to the previous rules -->
   <!-- Some rejected hosts -->
   <include tag="rejected-hosts">
                blacklist.xml</include>
   <!-- Some VIP hosts -->
    <include tag="VIP">vip.xml</include>
</policy>

The high-level policy file does not depict which server is currently serving which subnet, and where the configuration files are located, etc. This type of unnecessary information is part of the ``incidental complexity'' that closures are designed to avoid. By excluding nonessential and architecture-specific information, the high-level policy can achieve a high level of reusability.

The low-level policy file contains nearly the same information as the DHCP/DNS configuration files, but it also contains the running state of the peer-peer system. For example, the following code is a part of the low-level policy. The `auth' attribute records the current ``boss'' in charge of this segment. The `failover' attribute shows that the failover is on. This protocol distinguishes between owners of information at a relatively fine grain.

<closure ts="1136491620">
   <dns>
      <ip>192.168.0.100/24</ip>
   </dns>
   ...
   <subnet-segments>
   <!-- The auth attribute hold the
     current owner of this subnet. -->
   <subnet physical="192.168.0.0"
         authMAC="00:02:3F:1F:9C:88"
         auth="192.168.0.21/24"
         failover="on">
      <id>192.168.0.0</id>
      <netmask>255.255.255.0</netmask>
      <max-lease-time>51000
                     </max-lease-time>
      <pool>
      ...
</closure>

When changes are needed, such as changing the range of available IP addresses, or IP renumbering [11], IP address closure can ease the job of an administrator. Currently renumbering is very labor-intensive and requires a series of carefully orchestrated steps. Given a change in policy, the closure could in principle take over this orchestration and accomplish the renumbering with minimal outside help. This includes validating that the renumbering is possible, and actually performing the renumbering once it is proved to be valid, leading the human administrator through a series of foolproof steps.

Implementation Details

We implemented a prototype of the IP address closure using the ISC DHCP [8] and BIND [7] software. The gossip protocol is built on TCP, and policy content is encoded in XML. The information is managed using Berkeley DB XML by sleepycat [21]. Our test environment consists of ten PCs running Linux. They are separated into four IP subnets connected by PCs configured as routers.

To implement self-bootstrapping, we extended the function of DHCP client and implemented the logic shown in Figure 5. The PIP box is pre-installed with the modified version of the ISC DHCP v3.0.2 package. When a PIP is booted, the Ethernet interface is configured to obtain its IP via DHCP. If an IP is obtained from fellow PIPs, the booting PIP will launch the gossiping process; otherwise, the self-bootstrapping process starts. The gossiping protocol is implemented as a pull-only peer-to-peer gossiping application. The transformation from a low-level policy to the actual configuration file utilizes XSLT [22] technology.

In our current setting, the size of contents in a PIP is around 4KB. The cloning of whole contents (from one PIP to a newly installed one) happens in about one second. We set the interval between two pull operations to be 20 seconds. Because of the size of our testing environment, the propagation delay is bounded to 20 seconds as well. Propagation delay is affected by both the frequency of pulling and the number of neighbors each PIP has. In our setting, it is safe for a PIP to have 10 neighbors. It will be interesting to validate this light-weight protocol in a real large-scale enterprise environment, and discover a range of optimal number of neighbors that each PIP should have.

The current capabilities of this prototype are bootstrapping, the dissemination of high-level policy and proposal through a P2P network, high-level to low-level policy translation, and automatic DHCP server configuration update (ISC DHCP only). Future self-managing features of an IP address closure (yet to be implemented) include policy-environment conflict detection, IP address pool shortage warning and auto-allocation. We achieved many of our goals in this prototype: to validate the feasibility of (1) self-bootstrapping, and (2) realizing a distributed configuration based on a high-level policy to provide a robust IP infrastructure.

Autonomics and System Administration

The popular vision of ``autonomic computing'' (or ``self-managing systems'') is that there will be no system administrators and systems will manage themselves. This vision is inaccurate and naive. Before an autonomic system becomes functional, much initial setup work must be completed by administrators. After the system is successfully configured into a functioning state, the system is monitored by both self-managing components and system administrators. If a problem occurs and it is beyond the self-healing ability of the autonomic system to correct itself, administrators must take over and restore the system to a functional state.

A rather obvious property of autonomic systems is also their most major pitfall. Current autonomic systems can only cope with predictable failure modes. If something unpredictable happens, a human is required to intervene and take appropriate action. The system can ``learn'' (postmortem) what it should have done, but cannot cope with the new problem without guidance and help.

Here there is a major and unexpected pitfall for the system administrator. The problems with which an autonomic system cannot cope are also problems that may stump the experienced system administrator. The autonomic system is best thought of as a ``junior system administrator'' armed with a set of ``best practice'' scripts that can solve most problems. When a problem does not fit any known description, then by nature, advanced intervention is needed. The system administrator who can cope with problems of this nature must be better trained than many current system administrators, with an emphasis on efficient and rational troubleshooting. But how (in the context of self-managing closures) does the system administrator achieve this high level of training, when the closure is trying to take control away, and isolate the system administrator from the behavior of the system? One cannot both train and isolate the system administrator. This is a major quandary in the design of autonomic systems: how will the administrator achieve the level of knowledge required to cope with contingencies?

Administering the IP Address Closure

We use the IP address closure as an example to discuss the impact of similar autonomic systems upon system administrators. The system administrators delegate some low level decisions to the closure. Thus, they can focus on the larger picture of IP address assignment schemes. The IP address closure relieves system administrators from the job of backing up policies, because PIPs clone policies from one another and are in essence self-preserving.

However, in no way is the system administrator redundant in the IP address closure. The closure cannot control or define the physical connectivity between devices, or guarantee the architecture of physical or virtual subnets. The system administrator has a permanent role in matching the physical architecture of the network with policies, and in intervening when the closure discovers a mismatch between the physical network and desired operating characteristics.

Another unavoidable role is that of bootstrapping the system from a non-functional state to a self-managing state. In our closure, this is accomplished by physical moves of devices. This eliminates common human errors in copying configurations and makes the bootstrapping protocol more or less foolproof, but requires a basic understanding of how the PIPs self-replicate.

Lessons Learned

The PIPs show us something fundamental about the ongoing relationship between system administrators and autonomic elements. The administrator is far from obsolete, but also somewhat removed from the day-to-day management tasks that were formerly part of the job. The system administrator becomes a crafter of policies, rather than just a troubleshooter. Far from being less skilled, the system administrator of the closure system actually needs a higher level of sophistication to deal with unexpected problems.

The changing role of system administrators includes the deployment and bootstrapping of autonomic systems. Each autonomic system has a set of preconditions that must be met before it can function as designed. System administrators must maintain the appropriate environment for the autonomic system. Before the autonomic mechanisms are implemented from top to bottom, each autonomic system must rely on human administrators to cope with the bottom layers. Although these autonomic systems provide self-configuration features, deployment and bootstrapping are unavoidable and uniquely human tasks.

The role of system administrators also include tuning and validation of the new autonomic systems. Many autonomic systems contain heuristics that require the collection and analysis of real production data. Before the system is tuned, system administrators may have to manage the system manually. After the system is tuned, it must be validated to make sure that it is configured as desired.

Administrators also must intervene when a problem cannot be handled by an autonomic system. This poses new challenges to autonomic systems and their users. Unlike current practice in which the administrators have absolute control, system administrators must turn off certain parts of the automated process and take over, just like taking over from automobile cruise control or auto-piloting. The responsibilities must be well-defined and documented. The switching process between autonomic and manual management must be well documented and practiced.

An autonomic system itself is more complex than a corresponding traditional system. For example, in configuring our PIPs, one must describe their operating environment and policies in great detail, giving information that would not be known by hosts in a non-autonomic DHCP/DNS infrastructure, such as the locations of routers on foreign subnets. This extra configuration, however, pays off when the resulting fabric of servers relieves one from routine tasks such as rebuilding servers or propagating changes to the network.

One primary obstacle to acceptance of autonomic solutions is trust [1, 10]. People often do not trust that machines can handle tasks reliably, when humans will lose control. In truth, autonomic solutions are ``assistants'' rather than masters; the fabric of management still contains both machines and humans. This paradigm is especially necessary at the lower levels, where human assistance is required. The human administrators can use help in implementing complex processes. One goal for autonomic systems is to automate IT service and resource management best practices [4]. Automating these best practices can best gain the trust of the management and administrators. Further, autonomic assistants can help humans track the state of a task. Our problem domain is close to the hardware; so close that a human element cannot be avoided to serve as the ``hands and feet'' of the system. Accountability issues are also unavoidable. Who should be responsible if a system is not tuned well and does not meet the specific requirement of a site and cause downtime?

Our brief discussion of the changing role of system administrator may seem daunting, but the job is in no danger of extinction. Current closures require extensive bootstrapping and handling of contingencies, and require monitoring by a highly skilled system administrator. In fact, management of autonomic systems seems to elevate the profession in several ways:

by requiring a high level of system administration expertise.
by redefining the role of system administrator as someone who directly interacts with policy.
by providing (through interaction with policy) upward mobility to management positions.
by providing a much needed human interface between autonomic elements and upper management.

Exception Handling

The effectiveness of an autonomic solution depends upon the efficiency with which humans can communicate with it. Our PIPs cannot solve all problems, so that their ability to effectively communicate problems to humans is crucial to their success.

A key feature of any autonomic system is how it handles cases in which self-management does not resolve an issue. The goal of exception handling in autonomic systems is to report any violations of policy and resolve them. For example, in the IP address closure, suppose an interface is declared to have a fixed IP address. When an administrator activates that interface, if this host is not physically located on the same subnet as it should be, the interface might get an IP address from DHCP on a different subnet than desired. In this scenario, no obvious error is generated. However, this is a clear exception to the policy. In this example, the root cause is that the interface is not connected physically to the proper subnet. We may choose to correct the root cause by physically moving the interface or perhaps setting up a VLAN to simulate physical rewiring. Alternatively, we could choose to reject the policy on the grounds that it is not implementable. This constitutes a form of exception in the closure.

The concept of exception has been widely used in the computer science field. Exceptions have been used mostly to express a predictable abnormal scenario that can be categorized in advance. Researchers have explored exception handling mechanisms in both intra-component and inter-component situations. For example, Romanovsky divided exception handling into local error detection in one component and coordinated action level handling among components [19]. Here we focus on handling of unexpected exceptions, or exceptions with unknown causes.

Because a closure defines observable behaviors, it is natural to define the possible exceptions raised by this closure also in terms of observable behaviors, just like the symptoms of patients. However, unlike the policy file, the more detailed the exception, the more helpful the information. A simple exception containing no extra information will almost definitely require a human to troubleshoot the problem.

The exceptions that a closure could raise can be divided into two categories:

Exceptions that cannot be handled by this closure. The cause of the exception is out of the scope of control of a certain closure. i.e., self-healing does not function properly in one situation. For example, in the Apache closure, if the file system is corrupted, the closure cannot possibly work properly, thus an exception must be thrown. Sometimes, the reason for the exception is unclear. Thus a generic exception might be raised.
Exceptions that may be handled by this closure but that the closure chooses not to handle. Rather, the closure wants the exception to be handled by other closures.
For example, in the Apache closure, if the server is unstable, the closure may choose to restart the server. Or, if the closure decides that a better way to handle this is to reboot the whole host machine, it may raise an exception and let another entity handle it (such as a host closure).
A special type of exception is related to human input. When the closure discovers that the intention of the administrator is unclear, or it encounters a condition where more human input is needed, it should raise an exception to request more information, instead of relying upon itself.

Exceptions can be handled by other closures or human administrators. Since we cannot wait to have closures to be built on all the layers and switched on at once, it is necessary to have a way for closures to request services from other non-closure systems or human administrators. In the exception-handling process, after the event causing the exception is resolved, the closure can be contacted manually or programmatically to continue its work. In a true autonomic system, most exceptions should be handled by a program, rather than by a human administrator.

Conclusions and Future Work

We propose an ``IP address closure,'' a self-managing IP management infrastructure providing DHCP and DNS services. The IP address closure mimics the best practices that administrators discovered in practice, and automates them through the coordination among Peered IP management nodes (PIPs). Thus, the IP address closure is designed to gain the trust of the system administrators to assist with their work.

The task of making low-level systems self-managing still requires solving many open problems. The key problem for IP management is to maintain an effective interface between the fabric and its human counterparts. Human administrators are not obsolete, and they are still critical because autonomic systems cannot escape exception problems due to physical limits upon architecture. However, designing policies and resolving exceptions might require a new set of skills for existing administrators. The policy still depends upon architecture.

The most complex and challenging problem is that of planning for safety in very complex changes. When policies change, there are often safe and unsafe ways to transition between policies, where an unsafe transition is one that temporarily exposes a security risk.

Another related problem is how to make the lower layers (routing and switching) self-managed in a similar way. These layers suffer from the same ``bootstrap problem'' that we observe for IP address management; the management fabric has to use what it manages for its own sustenance, and cannot do that until it manages that fabric. The simple solution of managing routing via an out-of-band management network may not be cost-effective for many sites.

Clearly, there are many issues to explore. If there is a single most important contribution of this paper, it is that the closure idea is possible at the IP layer, and that - even with bootstrapping difficulties - self-managing fabrics can function near the physical layer of a network, provided that there is a carefully orchestrated relationship between the self-managing fabric and its human partners.

The role of administrators in the autonomic era has already changed. Instead of being obsolete, autonomic systems challenge system administrators to obtain a higher level of expertise, including knowledge of policy design and architecture, tuning, and troubleshooting. At the same time, autonomic systems elevate the system administration profession and shorten the distance between management and system administration through the common language of policy-based interfaces. Some system administration jobs may be lost to autonomic systems, but those that remain may well enjoy better advancement opportunities, as well as increased respect and recognition for the profession.

Author Biographies

Ning Wu is pursuing his Ph.D. at Tufts University. His research interests are in system management, autonomic computing, system integration, and P2P systems. Before studying at Tufts, he had worked as an engineer for Genuity and Level 3 Communications Inc. He received an M.S. from State University of New York at Albany, an M.E. from East China Institute of Computer Technology, and a B.S. from Southeast University in China. Ning can be reached via email at .

Alva L. Couch was born in Winston-Salem, North Carolina where he attended the North Carolina School of the Arts as a high school major in bassoon and contrabassoon performance. He received an S.B. in Architecture from M.I.T. in 1978, after which he worked for four years as a systems analyst and administrator at Harvard Medical School. Returning to school, he received an M.S. in Mathematics from Tufts in 1987, and a Ph.D. in Mathematics from Tufts in 1988. He became a member of the faculty of Tufts Department of Computer Science in the fall of 1988, and is currently an Associate Professor of Computer Science at Tufts. Prof. Couch is the author of several software systems for visualization and system administration, including Seecube(1987), Seeplex(1990), Slink(1996), Distr(1997), and Babble(2000). He can be reached by surface mail at the Department of Computer Science, 161 College Avenue, Tufts University, Medford, MA 02155. He can be reached via electronic mail as .

Bibliography

[1] Chan, Hoi, Alla Segal, Bill Arnold, and Ian Whalley, ``How can we trust an autonomic system to make the best decision?'' 2nd International Conference on Autonomic Computing (ICAC 2005), pp. 351-352, 2005.
[2] Cisco Systems, Cisco CNS Network Registrar Users Guide, Software Release 6.1, 2004.
[3] Couch, Alva, John Hart, Elizabeth G. Idhaw, and Dominic Kallas, ``Seeking closure in an open world: A behavioral agent approach to configuration management,'' Proceedings of the 17th Conference on Systems Administration (LISA 2003), pages 125-148, 2003.
[4] Ganek, A. G. and T. A. Corbi, ``The dawning of the autonomic computing era,'' IBM Systems Journal, Vol. 42, Num. 1, pp. 5-18, 2003.
[5] IBM, An architectural blueprint for autonomic computing, IBM white paper, April, 2003.
[6] International Network Services, IPControl, https://www.ins.com/software/ipcontrol.asp.
[7] Internet Systems Consortium, Inc., ISC BIND, https://www.isc.org/index.pl?/sw/bind/.
[8] Internet Systems Consortium, Inc., ISC Dynamic Host Configuration Protocol (DHCP), https://www.isc.org/index.pl?/sw/dhcp/.
[9] Keller, A., J. Hellerstein, J.L. Wolf, K. Wu, and V. Krishnan, ``The champs system: Change management with planning and scheduling,'' Proceedings of the IEEE/IFIP Network Operations and Management Symposium (NOMS 2004), Kluwer Academic Publishers, April, 2004.
[10] Kephart, Jeffrey O. and David M. Chess, ``The vision of autonomic computing,'' IEEE Computer magazine, January, 2003.
[11] Limoncelli, Tom, Tom Reingold, Ravi Narayan, and Ralph Loura, ``Creating a network for lucent bell labs research south,'' Proceedings of the 11th Conference on Systems Administration (LISA 1997), pp. 123-140, 1997.
[12] Logan, Mark, Matthias Felleisen, and David Blank-Edelman, ``Environmental acquisition in network management,'' Proceedings of the 16th Conference on Systems Administration (LISA 2002), pp. 175-184, 2002.
[13] Lucent, Lucent network management software for enterprises.
[14] Melcher, Brian and Bradley Mitchell, ``Towards an autonomic framework: Self-configuring network services and developing autonomic applications,'' Intel Technology Journal, Vol. 8, Num. 4, Nov., 2004.
[15] Mockapetris, P., ``Domain names - concepts and facilities,'' RFC 1034, 1987.
[16] Mockapetris, P., ``Domain names - implementation and specification,'' RFC 1035, 1987.
[17] Network Working Group, DHCP failover proto- col, 2003, https://www3.ietf.org/proceedings/04mar/I-D/draft-ietf-dhc-failover-12.txt.
[18] Rhea, Sean, Brighten Godfrey, Brad Karp, John Kubiatowicz, Sylvia Ratnasamy, Scott Shenker, Ion Stoica, and Harlan Yu, ``OpenDHT: A public DHT service and its uses,'' Proceedings of ACM SIGCOMM 2005, 2005.
[19] Romanovsky, A., ``Exception handling in component-based system development,'' The 15th Int. Computer Software and Application Conference, COMPSAC 2001, 2001.
[20] Schwartzberg, Steven and Alva Couch, ``Experience in implementing a web service closure,'' Proceedings of the 18th Conference on Systems Administration (LISA 2004), 2004.
[21] Sleepycat Software, Berkeley DB XML, https://www.sleepycat.com/products/bdbxml.html.
[22] W3C, XSL Transformations (XSLT) Version 1.0, https://www.w3.org/TR/xslt.
[23] W3C, Web services architecture, 2004, https://www.w3.org/TR/ws-arch/.

Last changed: 1 Dec. 2006 jel