The claim has been made that DHTs are important building blocks because a service built on top of a DHT will automatically inherit the DHT's self-configuration, self-healing, and scalability. We found this claim to be largely true. The DHT's neighbor heartbeat mechanism and node join bootstrap protocol automatically repartition the keyspace-and hence the mapping of measurement reports to servers-when DHT nodes join or leave (voluntarily or due to failure or recovery), without the need for operator involvement or application-level heartbeats within SWORD. We benefit from the DHT's logarithmic routing scalability for sending updates, but the number of nodes touched by a range search query once it reaches the first node in the range scales linearly with the number of nodes in the DHT (assuming the queried range remains fixed). A somewhat more complex range query scheme that follows routing table pointers rather than successor set pointers provides logarithmic scaling [11] but is not currently used in our SWORD PlanetLab deployment.
Although SWORD benefits as described above from its tight integration with a DHT, this integration does cause at least one difficulty. Because PlanetLab is a shared infrastructure, the CPU load on a node can become quite high. This fact interacts badly with Bamboo's heartbeats, which declare a node unreachable if it does not respond to heartbeats within a sufficient time period. The DHT does not distinguish a heartbeat timeout caused by node or link failure from one caused by high load on the peer node. This is arguably a reasonable choice for a DHT, because nodes with extremely high loads will degrade the performance of the DHT and may therefore be best left out of the system. But it is problematic for SWORD, because we do want SWORD to run on very highly loaded nodes (for example, a developer might use SWORD specifically to find heavily-loaded, resource-constrained nodes). Because SWORD's measurement reporting facility is integrated with the DHT, a highly loaded node's removal from the DHT prevents it from reporting measurement updates. In our design of SWORD we aimed to treat the DHT as a ``black box,'' not second-guessing parameters set for the DHT. A possible solution that does not require modifying or tuning the DHT, is to separate the measurement reporting functionality from the DHT and SWORD query processor. We could build a standalone reporting ``stub'' that runs on heavily loaded nodes that have been excluded from the DHT, and sends its measurements to a DHT node acting as a proxy (much as DHT nodes serve as a proxy for queries originating outside of SWORD). That gateway node would insert measurement reports into SWORD on behalf of non-DHT nodes, and would proxy queries and their responses on behalf of the non-DHT nodes. This would allow SWORD to accept updates and queries from all nodes without requiring the DHT and SWORD query processor to operate on heavily-loaded nodes. A generalization of this principle is that unlinking of fate sharing between the DHT logic and logic that does not strictly require the DHT can reduce the impact that DHT design decisions and parameters have on an application that uses the DHT.
We used the PlanetLab Application Manager [8] to automatically restart crashed SWORD instances. It was important to disable this feature during debugging, since in that setting a crashed application instance generally indicates a bug that needs to be fixed. Automatic re-start was a mixed blessing once we had deployed the service in ``production.'' While it allowed SWORD to recover quickly from node reboots, and allowed us to continue to provide the service in the face of bugs, it hid transient bugs. Because periodically collecting logfiles from hundreds of machines to look for restarts is time-consuming and resource intensive, a more sensible approach is to automatically email the service operator the most recent logfile each time the application is restarted on a node. Restart allows a service to handle failure gracefully-but at times perhaps too gracefully.
PlanetLab is commonly used to test and deploy scalable wide-area services. For some services, such as monitoring, the platform is small enough that centralized solutions may offer adequate performance. It is therefore tempting to build such services with an interface to external users only at a central data aggregation point. An example of such a service is Trumpet, which collects and aggregates per-node event data. Trumpet data can be retrieved from the Trumpet server, but it is not available through a local interface on each node where data is collected. To work around this fact, each SWORD instance contacts the central server every 15 minutes to retrieve information about itself, which it then publishes along with its locally-collected ganglia, CoTop, and network coordinate measurements. The operators of the Trumpet service have indicated that they will soon be deploying a decentralized version of the service, which will make this unnecessary. But this technique generalizes to any centralized data source, at the cost of some inefficiency in retrieving data into a decentralized system from a central server rather than from local sources.
Finally, for the purposes of quickly turning our research prototype into a service usable by others, we found that a simple user interface with semantics close to those used internally by SWORD was helpful. Although our long-term vision for SWORD includes sophisticated user interfaces that allow service deployers to graphically depict desired deployment configurations and penalty functions, we first wrote a simple C client that sends an XML file from the user's disk over a network socket to SWORD. Two external users have already begun developing tools that make use of this programmatic interface. A graphical interface, a more sophisticated query language, or a SOAP interface can be layered on top of the current minimal interface.
Jeannie Albrecht 2004-11-03