Check out the new USENIX Web site.


USENIX, The Advanced Computing Systems Association

LISA '06 Paper

Specification-Enhanced Policies for Automated Management of Changes in IT Systems

Chetan Shankar - University of Illinois at Urbana-Champaign
Vanish Talwar, Subu Iyer, Yuan Chen, and Dejan Milojicić - Hewlett-Packard Laboratories
Roy Campbell - University of Illinois at Urbana-Champaign

Pp. 103-118 of the Proceedings of LISA '06: 20th Large Installation System Administration Conference
(Washington, DC: USENIX Association, December 3-8, 2006).

Abstract

Enterprise and grid computing systems are complex and subject to a broad range of changes such as configuration updates, failures, and performance degradations. These changes affect infrastructure elements such as computation and storage nodes, applications, and system management elements such as monitoring infrastructures. Today's best practices in use by system administrators to manage these changes are manual and ad-hoc. In large complex installations, this would lead to high operational costs, broken closed loop automation, and reduced agility. Providing tools and mechanisms to administrators that automate the reaction to these changes is highly desirable and is an active research area.

Policy-based management using Event-Condition-Action (ECA) rules is a well-known approach for such automated change management where management actions are executed when specified event-conditions are observed. In complex systems, the interdependence of components generates multiple events when a single change happens causing multiple rules to be triggered. The order of execution of rule actions determines the system behavior necessitating reasoning about execution order. ECA rules do not contain explicit action specifications needed for reasoning and are therefore unsuited for specifying management rules.

In this paper, we propose a specification-enhanced ECA model called Event-Condition-Precondition-Action-Postcondition (ECPAP) for designing adaptation rules. ECPAP rules contain action specifications in first order predicate logic enabling us to develop reasoning algorithms to determine enforcement order of multiple rules. The enforcement order is represented as a Boolean Interpreted Petri Net workflow. We introduce a new notion called enforcement semantics that provides guarantees about rule ordering. We have built an adaptation framework using ECPAP model and have demonstrated it for automated change management of Ganglia and HP OpenView monitoring systems. The evaluation of the framework illustrates the significance of the ECPAP model and demonstrates its applicability for managing complex IT environments.

Introduction

Modern IT systems, such as enterprise data centers and grids, are paradigms of distributed computing where computation and data are distributed across diverse computational and storage elements. These systems are characterized by growing complexity, scale, and heterogeneity of infrastructure and applications. Further, these systems are highly dynamic, and subject to frequent changes such as service plug-in/plug-out, workload variations, failures, configuration updates, and application migration. Such changes affect the runtime operation of the system, and the service contracts offered to customers. Therefore, in reaction to these changes, infrastructure elements, applications, as well as system management components need to be adapted. For example, compute and storage resources may have to be re-allocated, applications may need to be restarted, and monitoring infrastructures may require re-configuration.

Current approaches used by administrators to manage these changes are manual and/or a combination of ad-hoc tools and scripts. The process typically requires special expertise and detailed actions by administrators. While these approaches may work fine in small scale installations, they do not scale in larger scale installations typical of modern IT systems and utility systems of tomorrow. In such environments, there is a need to provide administrators with tools that can capture the expert domain knowledge in machine readable format and thereafter react to changes in an automated manner.

The use of Event-Condition-Action (ECA) rules [1] is a well-known approach for enabling system administrators to specify the desired actions to be invoked on changes in policy rules [5, 6, 7, 10, 20]. When a change event is received, the rules matching the event are determined. If the conditions in these rules are true, the corresponding actions are executed. An example of an ECA rule is ``When checkpoint store is full (event), if backup store is running (condition), assign backup store as new checkpoint store (action).'' When the checkpoint store becomes full, an event is sent that triggers the rule. The management system verifies if the backup store is running and if so assigns it as the new checkpoint store.

Policy-based management systems have been effectively used to manage network switches [5], content distribution networks [6], and general distributed systems [7]. The applicability of policy-based systems for reacting to changes in today's IT systems, such as data centers and utility infrastructures, presents numerous challenges due to highly interdependent components. Changes in these environments propagate rapidly causing several related changes.

Consider, for example, an infrastructure for monitoring performance of a data center as depicted in Figure 1. The infrastructure consists of a set of monitored nodes on which monitoring agents continuously collect performance data and send them to nodes called aggregators. These aggregators perform statistical functions on the collected data and send them to interested clients, such as archival stores and performance visualizers. An aggregator's output typically represents the performance data of a cluster and is useful for viewing consolidated information.



Figure 1: Infrastructure for monitoring performance of a data center.

If the aggregator node fails, monitored nodes cannot send data and may report failures. Similarly, clients using the performance data may report failures as they do not receive any data from the aggregator. Other clients such as load balancers and job schedulers using this data may unnecessarily rebalance the load or schedule jobs incorrectly thus affecting the overall utilization of the system.

Change detectors monitoring different aspects of the system sense several changes due to aggregator failure and generate various events. This causes several policy rules to get triggered in the adaptation system.

The order of enforcement of rules determines the final system state. As illustrated in Figure 2, the failure of aggregator node generates multiple events. These events trigger two rules - R1 and R2. R1 states: ``If aggregator node fails, assign failover node as aggregator'' and rule R2 states: ``If data-send from monitoring agent fails, reconnect to aggregator.'' When the aggregator node fails, both rules are triggered. If R1 is enforced before R2, the failover node is assigned as aggregator and the monitoring agent is reconnected to it. But if R2 is enforced before R1, the action of R2 fails since the monitoring agent tries to reconnect to the stopped aggregator.



Figure 2: Problem with ECA based system.

Therefore, the order of enforcement of rules determines the final system state when multiple rules are simultaneously triggered. An adaptation system should reason about the enforcement order of rules and provide guarantees for system behavior to be deterministic.

ECA rules do not contain explicit action specifications needed for reasoning and are therefore unsuited for specifying management rules in such environments. Therefore, we propose an extended model of ECA called Event-Condition-Precondition-Action-Postcondition (ECPAP) for designing management rules. These rules contain the axiomatic specification of rule actions in first-order predicate logic as pre-and post-conditions. The pre-condition specifies the partial system state before execution of rule action while post-condition specifies the partial system state once the action has successfully executed.

Note that the rule condition is different from pre-condition because the rule condition is specified by the policy designer while the pre-condition is specified by the action developer (programmer). We have used the ECPAP rule framework for conflict detection and resolution and monitoring of rule enforcement in [11] and analyzing policy cycles in [10]. In this paper, we show how the pre- and post-conditions can be used to determine the dependencies among triggered rule actions and reason about their enforcement order.

Given a set of triggered rules, the adaptation system dynamically generates a Petri net workflow representing dependencies among rule actions. An action depends on another action if the pre-condition of the former is satisfied by the post-condition of the latter. Since specifications are in first-order predicate logic, which are undecidable in the general case, dependencies can only be determined at runtime when the specifications are instantiated to form propositional expressions. Therefore, workflows of rule actions cannot be constructed statically and must be done at runtime.

Certain actions may be independent of other triggered actions and therefore may not be present in the workflow. In such cases, the system could take one of several possible decisions, for, e.g., it may abort the workflow, or it may execute the maximum possible actions and so on. We identify the need for policy-based systems to provide semantic guarantees for rule enforcement in such circumstances.

We introduce a notion called rule enforcement semantics for policy-based systems that provides certain guarantees when multiple rules are concurrently triggered. We have identified three enforcement semantics - random, all-or-none and maximum rule - that have been found useful in different circumstances. We discuss this in detail in section ``Ordering Rule Enforcement.''

We make the following contributions in this paper:

  • We propose a specification-enhanced rule framework called Event-Condition-Precondition-Action-Postcondition (ECPAP) for specifying adaptation rules.
  • We present algorithms to determine enforcement order of multiple rules using the ECPAP model. The enforcement order is represented as a Boolean Interpreted Petri Net (BIPN) workflow. We also introduce a new notion called enforcement semantics that provides guarantees about rule enforcement.
  • We describe an adaptation framework built using the ECPAP model and demonstrate its application for automated change management of Ganglia and HP OpenView monitoring systems.
  • We present evaluation results that illustrate the need for enforcement guarantees and the feasibility of the ECPAP model.

Our extension of the ECA framework with action specifications follows naturally from current research efforts in autonomic computing. There has been widespread interest lately on using planning techniques from AI for programming and managing distributed systems with encouraging results [10, 11, 12, 13, 24]. In [10, 11] we showed how extending actions with specifications enabled advanced conflict and termination analysis for policy-based management systems. Andrzejak, et al. [12] have used actions with pre- and post-conditions for planning complex workflows from simple actions for system management.

The ABLE project [13] uses axiomatic specifications of actions for goal-based autonomic computing. Anand, et al. [24] use specification-enhanced actions, expressed as pre-conditions and effects, for programming pervasive computing environments. These research works have shown that annotating actions with simple pre- and post-condition specifications provides numerous benefits such as raising the programming abstraction level and automating system management. Based on the success of these efforts we have extended management policies with action specifications and introduced the ECPAP rule framework.

The rest of the paper is organized as follows. In the next section, we present in detail the ECPAP rule framework. We then describe the workflow generation algorithms and enforcement semantics. Subsequently, we present the ECPAP-based adaptation framework and its application for managing monitoring systems. We then discuss evaluation results and some lessons learned. Finally, we present related work and a conclusion.

Specification-Enhanced Rule Framework

The ECA rule framework is used in different paradigms such as active databases, access control and system management to react to different situations. Active databases use the ECA framework for designing triggers that specify actions to be executed when certain database operations such as record insertion or deletion are made. Access control systems use ECA rules to authorize or deny access when an access request is made. Management systems use the ECA framework for designing obligation rules [7] to specify management actions to be executed when system changes are observed. Rule actions in active databases and access control are normally well-defined and hence their effects on the system are implicitly known.

For example, active database trigger rules normally use insert, delete and update actions [21] while access control actions are normally authorize, deny and delegate. This enables complex reasoning such as confluence [21], rights-amplification and conflict analyses to be performed over these rules. But rule actions in system management are not well-defined and can range from simple atomic actions to complex scripts and so their effects on the system are not implicitly known. Therefore, explicitly specifying the action effects using pre-and post-conditions enables complex reasoning to be performed over management rules. This motivated us to design the ECPAP rule framework.

The ECPAP framework extends the ECA framework by using the Hoare triple [23]. A Hoare triple represented as {P}P{Q} describes how an action C changes the state of computation from a state where P is true to a state where Q is true. P and Q, expressed as first-order predicate logic expressions, are pre-and post-conditions of C, respectively and are called axiomatic specifications. The pre-condition specifies the system state that should exist before C can be executed.

Our adaptation policies are formulated as sets of ECPAP rules of the form on event if condition do action

A policy rule is read as: ``When event occurs in a situation where condition is true, then execute action.'' The action is a call to a method in a library of actions where each action is annotated with a pre-condition and a post-condition by the action developer (programmer). Note that pre- and post-conditions are not specified as part of the rules since an action may be invoked by multiple rules in the policy and this format avoids listing the specifications at multiple places.

We represent an ECPAP rule as (e, c, p) (a, s) where e denotes the rule event, c denotes the condition of the rule, p is the pre-condition of the action, a is the action to be executed and s is the action post-condition. Our policy rule framework extends that of Policy Description Language (PDL) [1] by adding axiomatic specifications as ``extension''s to the rule.



Policy 1: Adaptation Policy for Performance Monitoring Scenario.

Policy Syntax and Semantics

There are three basic classes of symbols: primitive event symbols, action symbols and constant symbols. Primitive event symbols represent basic events that can be subscribed to in the system. For example, MonNodeFail and ServiceFail are primitive event symbols that are generated when a monitored node or service fails. An event is a primitive event symbol or a term of the form e(T1 t1, ..., Tn tn) where e is a primitive event symbol of n arguments and each ti is a constant or a variable of type Ti. e(T1 t1, ..., Tn tn) represents a parameterized event where the parameters are bound to the data contained in the event.

The condition part of an ECPAP rule is a boolean expression containing constants and variables that appear in the event part of the rule.

Each action symbol denotes the name of a procedure that can be invoked in the system. An action is of the form proc(t1, ..., tn) where proc is an action symbol and {tis are parameters. For example, startService(S) is an action. Actions are defined in an action library that also contains pre- and post-conditions of actions.

Pre- and post-conditions of an action are first-order predicate logic formulas of the form

p1 ( | pk) k=2..m,
pi is a first-order predicate of the form Q1t1  X1, ..., Qn tn Xn pred(t1, ..., tn): Qi is a quantifier, Xi is a constant symbol and each ti is a constant or a variable.

A policy, P is a finite set of ECPAP rules. The adaptation system enforcing the policy expects as input an event e, and its occurrence is represented by occ(e). The semantics of each rule, (e, c, p) (a, s) in the policy is specified by the implication,

occ(e) c p exec(a)
exec(a) s
where exec(a) represents the initiation of the execution of action a. The evaluation of the rule and execution of the action is treated as an atomic operation, i.e., if the system state changes after the rule evaluation and before the action execution, the change is ignored.

Pre- and Post-condition Expressions

The pre- and post-conditions use pre-defined keywords for specifying first-order expressions. For example, MonitoredNodes and ClientNodes are keywords that represent sets of monitored nodes and client nodes in our system. In our example scenario these sets are: MonitoredNodes = {m1, m2} and ClientNodes ={store, visual}. A quantifier over these sets enumerates all the elements of the set. First-order expressions are undecidable in the general case and therefore we convert them to propositional logic expressions, which are decidable, during evaluation.

Consider, for example, x MonitoredNodes, statusNode(x, running) is a first-order expression that is converted to statusNode(m1, running) statusNode (m2, running) before analysis. These expressions are evaluated by invoking corresponding methods in the system and verifying the return values. For example, statusNode(m1, running) is evaluated by comparing the return value of the call statusNode(m1) with the string ``running.''

Example Adaptation Policy

The adaptation policy for the scenario described in the first section is shown in Policy 1. The pre- and post-conditions of actions are shown italicized in braces for convenience and are not specified as part of the rules.

Rule R1 gets triggered when the aggregator node fails and if the failed node is not the failover node, it assigns the failover node as the new aggregator. Rule R2 is triggered when any monitored node fails to send data to the aggregator. The rule tries to reconnect the monitored node to the aggregator. Rule R3 is triggered when a node fails to receive data from the aggregator node. The rule reconnects the node to the aggregator node. Rule R4 is triggered when the aggregation agent stops. The aggregation agent needs to be started everytime new clients or monitored nodes are connected to the aggregator node. This rule restarts the aggregation agent.

When the aggregator node fails, the monitored nodes are unable to send data and the archival store and visualizer nodes do not receive any data. Therefore, one AggregatorFail event, two DataSendFail events (one each from two monitored nodes), two DataReceptionFail events (one each from the archival store and visualizer nodes) and one AggregationAgentStopped events are generated. These events trigger multiple instances of the above rules.

If both instances of rules R2 and R3 are enforced before R1, the nodes try to connect to the failed aggregator and therefore do not succeed. If R4 is enforced before the other rules, the aggregation agent restart fails and so the nodes can neither send nor receive data. But if R1 is enforced before R2 and R3 and R4 is enforced in the end, the nodes get connected to the failover aggregator and the monitoring activity is restored. Therefore, the order of enforcement of rules determines system behavior. While ``correct'' order of rule enforcement is hard to define and requires experimental justification, other simpler guarantees about ordering can be provided as will be discussed shortly.

Ordering Rule Enforcement

An adaptation policy is subject to numerous changes such as addition and deletion of rules, rule modifications and policy composition. Each rule is generally evaluated and enforced independent of other rules in the policy. When multiple rules are triggered the order of enforcement of rules determines the system behavior, as demonstrated in the previous section. Therefore, we define a new notion called enforcement semantics that provides certain guarantees about rule enforcement. Enforcement semantics of a policy-based adaptation system dictates the way rules are to be enforced when multiple rules are simultaneously triggered.

When a set of rules is triggered, the execution order of the rule actions is determined by constructing a workflow that expresses dependencies between different actions. The pre- and post-conditions of actions determine which action enables which other actions. An action is said to enable another action if the post-condition of the former satisfies the pre- condition of the latter. In our example scenario, the post-condition in rule R1 satisfies the pre-condition in R2. Therefore, action of R1 is said to enable that of R2.

The workflow of rule actions is represented as a Boolean Interpreted Petri net (BIPN) [4], which is useful to model and reason about concurrent action execution. A Boolean Interpreted Petri net is a Petri net [3] whose transitions are assigned Boolean functions. A transition can fire only when all of its input places are marked and its Boolean function is true. We assign a place to each action and each transition leading to the place is assigned the pre-condition of the action as the Boolean function. We formally define our workflow BIPN in Appendix I. We will describe the algorithms that construct the workflow in the rest of this section. The various notations used in the algorithms in this section are catalogued in Figure 3.


V:set of trivially-enabled actions
A:set of actions of triggered rules
Enable(a):set of actions enabled by action a
P = {Start}: set of Petri net Places - initialized to Place called `Start'
T = {}:set of Petri net Transitions
place(a):Place for action a
adj(x): adjacency list of x represented as a set, where x P T
trans(p, f): Transition with Boolean function f connected by edges from places in set p

Figure 3: Notations used in the workflow generation algorithms.

Workflow Construction

The workflow is constructed by analyzing each pair of actions to determine if one enables the other. The current system state can be represented as a set of propositions and pre-conditions of certain actions may be satisfied by it. These actions are independent of other triggered rules and can be executed as the first set of actions in the workflow. These actions are called trivially-enabled actions.
Definition 1: An action a is said to be trivially-enabled if the current state of the system, I, satisfies its pre-condition. Formally, it is represented as I pre(a), where is the satisfies symbol.

In our example scenario, the pre-condition in R1 : statusNode(``FailOver'', running) is satisfied by the current system state since the failover system is running when the aggregator node fails. Therefore, the action of R1 can be executed independent of actions in other triggered rules. Algorithm 1 to determine trivially-enabled actions is shown in Figure 4a.

The algorithm initializes the Petri net by assigning a place to each action and creating a transition with the Boolean function true. This transition is connected to the Start place. The algorithm evaluates the pre- condition of each action to determine if it is true and marks the action as trivially-enabled if so. These trivially-enabled actions are connected by edges from the true transition. In our adaptation scenario, only action A1 is trivially-enabled (action of rule Ri is represented as Ai). The Petri net workflow that results from the algorithm for our adaptation scenario is shown in Figure 4b.



Figure 4a: Algorithm 1: Workflow Initialization and Trivially-enabled action analysis.


Figure 4b: Petri net workflow after trivially-enabled action analysis.

Once trivially-enabled actions have been identified, we check to see which action enables which other actions through enablement analysis.
Definition 2: An action a1 is said to enable action a2 if post(a1) pre(a2) where post(a1) represents the post-condition of action a1 and a2 is not trivially-enabled.

This implies that execution of a1 would satisfy the pre-condition of a2 and so a2 can be executed after a1. Since any proposition satisfies the true proposition, we do not check if post-condition of an action satisfies pre-condition of a trivially-enabled action.

Algorithm 2 for enablement analysis is shown in Figure 5a. This algorithm verifies for each triggered action if its post-condition satisfies the pre-condition of a non-trivially-enabled action. It does a pair-wise satisfiability check of actions to determine enablement. The set Enable(a) contains all actions that are enabled by action a. The algorithm iterates through each action a and if a enables other actions, it connects them to a through transitions labeled with their pre-conditions.

In our example scenario, both instances of action A2 (corresponding to two monitored nodes) and both instances of action A3 (corresponding to archival store and visualizer) are enabled by A1. The Petri net resulting from Algorithm 2 is shown in Figure 5b. Aki represents the ith instance of action Ak. Since two instances of rules R2 and R3 are triggered, the Petri net contains two instances of their actions represented as A2i and A3i, (i=1, 2).


Enable(a) = {},  a  A//enablement analysis
for each action a  A
    for each action b  A-V
        if post(a)  pre(b)
             Enable(a) = Enable(a)  {b}
for each action a  A//adding transitions to workflow
    for each action b  Enable(a)
        t = trans({place(a)}, pre(b))
        if t  T
            T = T  {t}
            adj(place(a)) = adj(place(a))  {t}
        end if
        adj(t) = adj(t)  {place(b)}
    end for
end for

Figure 5a: Algorithm 2: Enablement Analysis.


Figure 5b: Petri net workflow after enablement analysis.

Post-conditions of some actions may satisfy part of the pre-condition of another action. For example, post-pcondition of A1: statusAggregator(running) satisfies a part of the pre-condition of A4. Similarly, post-conditions of A21, A22, A31 and A32 satisfy the other parts of the pre-condition of A4. Therefore, A1, A2i and A3i must be executed to enable A4. We say that each action A1, A2i and A3i partially-enables A4. Note that the variables x and n in predicates connectionStatusToAggregator(x, connected) and connectionStatusToAggregator(n, connected) are bound to values of the nodes during evaluation.
Definition 3: An action a1 is said to partially-enable action a2 if post(a1) vanishsym partial-pre(a2), where partial-pre(a2) is a conjunction of some proper subset of conjuncts of pre(a2). A set of partially-enabling actions of an action a that together enable a is called a partial-set of a. An action may have multiple partial-sets and therefore, the set of all partial-sets of a is denoted by partial-sets(a) . In the above example,
partial-sets(A3) = { {A1, A21, A22, A31, A32 } }.

Algorithm 3 in Figure 6a determines for every action a that is not trivially-enabled, which set of actions collectively enable a. If the set contains only one action, then it implies that a single action enables a and therefore is already determined by Algorithm 2. Therefore, Algorithm 3 only considers sets having more than one element. In addition, the algorithm does not test an action with itself for partial-enablement as this might lead to a deadlock.


Partial-sets(a) = {}
S   :set that temporarily contains partially-enabling
     actions of an action
for each action a  A-V//partial sets determination
    S = {}
    for each action b  A-{a}
        if b partially-enables a
            S = S  {b}
    for each subset s of S
        if (cardinality(s) > 1)
            p = true
            for each action a  s
                p = p  post(a)
            if p satisfies pre(a)
                Partial-sets(a) = Partial-sets(a)  {s}
        endif
    end for
end for
for each action a  A-V//adding transitions to workflow
    for each set s  Partial-sets(a)
        t = trans(s, pre(a))
        T = T  {t}
        adj(t) = adj(t)  {place(a)}
        for each action b  s
            adj(place(b)) = adj(place(b))  {t}
    end for
end for

Figure 6a: Algorithm 3: Partial-sets Determination.

Though the algorithm for partial-enablement analysis can replace enablement analysis of Algorithm 2, we separate the two algorithms since partial-enablement analysis has a much higher complexity. We will discuss this in more detail below when we evaluate the algorithmic complexities.

Once we determine the partial-sets, we complete the workflow construction by adding transitions. The Petri net generated from Algorithm 3 is shown in Figure 6b.



Figure 6b: Final Petri net workflow.

Enforcement Semantics

Once dependencies among triggered rule actions have been determined, the enforcement semantics of the adaptation system specifies the execution order of actions. We have identified three different enforcement semantics for policy-based adaptation systems.

Random

This semantics executes rule actions in a random order. The pure ECA policy system without the specification enhancements follows this semantics, implicitly, since it does not provide guarantees about enforcement of multiple triggered rules. This is the weakest of all three semantics and does not require the action workflow to be constructed. This semantics can be used when dependency among rule actions is low and very few rules are triggered by a single change.

All-or-None

The all-or-none semantics specifies that the rule actions in the workflow must be executed only if all actions can eventually execute. This implies that even if one action in the workflow cannot be enabled then the entire workflow should be discarded. In order to enforce this semantics, the BIPN workflow is analyzed to see if all places can be reached using a reachability algorithm [3]. The all-or-none semantics provides the strongest guarantee and is useful in policies that have high dependency among rule actions.

Maximum Rule

The maximum rule semantics guarantees that the management system enforces rules in an order that ensures as many rules are successfully enforced as possible, provided no other errors cause rule enforcement to fail. The difference between all-or-none and maximum rule enforcement semantics is that in the latter if any place in the workflow can be reached from the Start place it will be executed. If a place cannot be reached, the workflow is not discarded as in the all-or-none semantics. Our adaptation framework discussed in the ``Framework'' section uses the maximum rule enforcement semantics.

We prove formally in Appendix I that the workflow algorithms described above guarantee the above semantics by showing that ordering actions according to the workflow enables maximum number of rules to be successfully enforced.

Action Execution

The order of execution of rule action depends on the enforcement semantics used in the system. If random enforcement is used, the workflow construction is skipped and actions are executed in an arbitrary order. The all-or-none and maximum rule enforcement semantics use Petri net based traversal algorithms to traverse the workflow and execute actions. If the system guarantees all-or-none semantics a reachability analysis [3] is performed to determine if all places are reachable from the Start place prior to execution.

A workflow execution engine analyzes the Petri net for any deadlocks using the deadlock detection algorithm described in [3]. If a deadlock is found the execution engine does not execute any action in the workflow. Currently, we do not resolve deadlocks and abandon the workflow. If the Petri net is deadlock-free, the engine uses a simple Petri net traversal algorithm based on Breadth-First Search (BFS) to traverse the net and execute actions.

The transition states of the Petri net act as synchronization points in the workflow. When multiple places lead to a single transition, the engine waits for the completion of all actions in the places before executing actions of places leading out of the transition. At each transition, the engine verifies the Boolean function for satisfaction before executing the following action. For our adaptation scenario, action A1 is executed, followed by concurrent execution of actions A2i and A3i and then action A4 is executed.

ECPAP-based Adaptation Framework

We have designed a framework based on ECPAP rules for adapting Ganglia and HP OpenView monitoring systems to various changes. The framework is external to the monitoring systems and does not modify either system. It uses the reconfiguration support provided by the systems for adaptation. In this section, we discuss the details of the framework and its applications.

System Architecture

Adaptation policy, containing ECPAP rules, is compiled and loaded into the adaptation system by the system administrator. The adaptation system subscribes to events specified in the policy and initiates corresponding actions when those events are fired. Figure 7a shows the adaptation framework and Figure 7b illustrates the steps involved in loading a policy, planning adaptation and initiating actuations.



Figure 7a: Adaptation Framework.


Figure 7b: Adaptation flowchart.

A policy is compiled into a policy object file. The policy object file contains rules in a format suitable for loading into the enforcement system. The enforcement system subscribes to policy events and waits for the occurrence of events

Since a single change to the monitoring system may trigger more than one event occurrence, we define time intervals called epochs and consider all events received within an epoch to correspond to a single change. The epoch model for event reception was proposed in [2] and was found suitable for defining policy rules with composed events.

Since a composed event normally contains events that have occurred ``simultaneously'' and event reception system receives events sequentially, the epoch model provides a good approximation to simultaneity. At the end of each epoch, the adaptor evaluates the policy and determines the set of rules that are triggered. The triggered rules are checked for conflicts and resolved using a priority-based resolution technique [10]. The adaptor reasons about the enforcement order of rules using pre- and post-conditions of actions and generates the Petri net workflow. The workflow is executed by a workflow execution engine and the adaptor waits for further events.

The policy-based adaptor supports interfaces to load policies and conflict resolution rules, query the policy store for the loaded policies and retrieve the set of actions in the action store. In addition, the system also supports user interfaces to list available events and actions. These interfaces are useful for designing policies.

Event Reception Model

Our monitoring framework views a change as a set of correlated events and evaluates the policy based on the events in the set. Event correlation is a well-researched problem and numerous models have been proposed to group events corresponding to a change [25, 26, 27]. Since the focus of our work is on policy evaluation and enforcement, we use a simple event correlation model based on epochs proposed by Chomicki, et al. [2] for policy evaluation. Figure 8 illustrates the epoch model. The input to our adaptation framework is a set of events and therefore, the epoch model can be replaced by more appropriate correlation models without affecting policy evaluation and enforcement.



Figure 8: Epoch model.

Policy Tools and Adaptation Implementation

The adaptation framework consists of policy tools and the adaptation system. The policy compiler generates Java class files from policy rules. We used this approach to leverage the language features of Java. An ECPAP rule is compiled into a method that has the same signature as the event. The condition part of the rule is translated into an `if' block in the method. The conditional expression of the `if' statement is the same as that in the rule. The rule action is translated into a set of statements that create a Java object containing the action object along with its pre- and post-conditions. If multiple rules have the same event signatures, the compiler consolidates the rules into a single method with multiple `if' blocks. A typical Java class, for a rule, generated by the policy compiler is shown in Figure 9.



Figure 9: Policy compilation.

These classes are compiled by a Java compiler into class files and loaded into the adaptation system by the policy loader. When an event is received by the adaptation system, an equivalent method signature is created with the event name as the method name and the event parameters as method parameters. This method is invoked on the rule class files and if the rule contains the event, the method invocation succeeds. The condition is checked and if it is satisfied, an object containing the action object along with its pre- and post-conditions is returned. This approach provides the adaptation system with the action specifications necessary for reasoning.

The adaptation system has been implemented in Java. Figure 10 shows the main components of the framework. It consists of a policy store that stores a set of Java objects instantiated from rule classes. An action library acts as an actuator store and contains a set of actions that can be invoked from policy rules. The action library in addition contains specification of actions in first order predicate logic.



Figure 10: Adaptation framework implementation.

Application to Monitoring Systems

Our data center and enterprise systems use HP OpenView and Ganglia systems for performance monitoring. Since monitoring systems run independent of the core system services, we tested the applicability of our adaptation framework to manage changes to these systems.

The configuration of our monitored environment consists of a set of nodes monitored by the Ganglia monitoring system [14]. Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and grids. It supports a hierarchical organization of monitoring agents called Ganglia Monitoring Daemons (gmond) that collect monitoring information from individual nodes.

Aggregation agents called Ganglia Meta Daemons (gmetad) collect data from gmonds and apply aggregation functions to provide consolidated information about cluster of machines. The focus of Ganglia is on monitoring nodes and provides simple replication-based approaches for tolerating node failures. It does not provide support for aggregation agent failures, application migration or other infrastructure changes.

We used the adaptation framework to enhance the resilience of Ganglia monitoring system to changes as proof-of-concept. We developed change detectors to detect aggregator failures and data-send and data-receive failures from the various nodes of the monitored environment. The change detection system generates parameterized events that contain relevant state information. The reasoning system uses XSB Prolog [19] to verify satisfiability of propositions.

We also used the adaptation framework for adapting OpenView monitoring system. The framework enables adaptation when services are plugged-in, plugged-out or migrated, alarm events are generated and so on. For example, the adaptive system dynamically configures the OpenView components to collect performance data from a service when it is plugged-in. Similarly, the framework reconfigures the components to collect additional metrics, through deep-diving (monitoring specific components), when alarms are generated due to high CPU utilization, low memory and so on. Our initial evaluation demonstrates that the ECPAP-based adaptation framework can dramatically reduce the administration cost of OpenView monitoring system.

Evaluation

In this section, we will discuss the algorithmic complexities of the various algorithms presented in the paper and use them to explain the system performance that we have empirically measured.

Algorithmic Complexity

Trivially-enabled action analysis (Algorithm 1) has a linear complexity of O(n) pre-condition checks for n actions. Enablement analysis (Algorithm 2) does a pair-wise satisfiability check of actions and therefore has a quadratic complexity of O(n2). Partial-enablement analysis (Algorithm 3) analyzes for each action if it is enabled by a set of actions.

Each action subset must be determined and this has an exponential complexity of O(2n). Since each subset is tested to see if it enables the action for all actions the final complexity is O(n22n). Currently, Algorithm 3 has a very high complexity but there are various optimizations that can be performed to reduce the value of n.

For example, the enablement analysis algorithm reduces the number of rules to be verified during partial-enablement analysis. Since enablement analysis has a quadratic complexity the overall performance overhead is greatly reduced. In addition, the number of rules that are normally triggered on a single event is quite less (fewer than five rules per event in our adaptation policy) and so the overhead is tolerable. We are currently looking at static analysis techniques to determine dependencies between different rules at policy compilation time. Finally, the overall complexity of the workflow generation is O(n22n), bounded by the complexity of the partial-enablement analysis.

Experimental Validation

The performance overhead of Petri net workflow generation is shown in Figure 11. The adaptation system was executed on a Pentium III 1GHz dual processor SMP machine with 2 GB memory. Figure 11a shows the overhead with varying number of triggered rules. Our test policy had multiple instances of the same rule since the focus was on testing the overhead of the system. As predicted from the algorithmic complexity described above the overhead is exponential with the number of triggered rules. For 15 triggered rules the overhead was found to be around three seconds. Normally, for a typical policy, the number of rules triggered on a single change can be expected to be much less than 15 and so the approach is feasible. Several optimization using configuration and state models can be performed to reduce the overhead. We do not discuss these optimizations since they are out of the scope of the paper.

The number of predicates in pre- and post-conditions of actions influences the Petri net generation overhead. Therefore, we measured the overhead with varying number of predicates in action specifications. Figure 11b illustrates the performance overhead of the system. The x-axis indicates the average number of predicates for each pre- and post-condition. The y-axis shows the overhead in seconds. The overhead is less than one second for about 144 predicates.



Figure 11a: Petri net Workflow Generation vs Number of Triggered Rules.


Figure 11b: Petri net Workflow Generation vs Average Number of Predicates.

Figure 12 shows the graphs of times required for policy compilation and rule evaluation. Policy compilation has a reasonable overhead of about one second for a policy containing 400 rules. Since policies are compiled only when they are modified, this is an acceptable overhead. Rule evaluation is an important component of the adaptation process and therefore its overhead contributes to the overall performance of the system. We measured the performance of the adaptation system with varying number of rules in a policy. We found that rule evaluation takes about 100 ms for a policy with 2000 rules which is a reasonable overhead for an adaptation system.



Figure 12a: Policy Compilation Times.


Figure 12b: Rule Evaluation Times.

Experience in a Real Scenario

We have used our adaptation system for managing changes to monitoring systems as was described earlier. Figure 13 shows the Ganglia visualizer output on aggregator failure for our adaptation scenario. The policy consisted of four rules shown in the ``Example Adaptation Policy'' section. On aggregator failure, six events - one AggregatorFail, two DataSendFail, two DataReceptionFail and one AggregationAgentStopped event - are generated. In the first experiment (Figure 13a), an ECPAP system with maximum rule enforcement enabled was used and the aggregator was manually stopped. The temporary disruption during adaptation is illustrated in the figure. The adaptation used the policy from the ``Specification-based Rule Framework'' section.

Once the rules are enforced, monitoring resumes using the failover aggregator. In the second experiment (Figure 13b), a pure ECA based system was used and the adaptation framework enforced rules as soon as they were triggered. As illustrated in the figure, random enforcement of rules failed to connect the nodes to the failover aggregator and so the visualizer node did not receive data.



Figure 13a: Disruption in monitored data during adaptation (as perceived by the visualizer node) - ECPAP based system with Maximum Rule Enforcement (with reasoning).


Figure 13b: Pure ECA based system (Random Rule Enforcement).

Before performing the second experiment, the original aggregator was restarted and the failover aggregator was disconnected. This caused no disruption in data reception since atleast one aggregator was active during the switch. Thus, we see that with an ECPAP based approach, even though there is overhead in workflow generation, we guarantee recovery, whereas with a pure ECA based system (random enforcement order), complete disruption could happen.

Lessons Learned

Policy Design: Designing adaptation policies is an onerous task and requires significant knowledge of the system configuration and functioning. Policy designers should foresee various changes that may affect IT systems and specify rules. This demands significant system knowledge and expertise from the administrator.

Defining Correctness: In this paper, we introduced the notion of enforcement semantics and identified three different semantics for rule enforcement. In order to prescribe a specific semantics we need to define correctness for concurrent rule enforcement. This requires empirical validation to determine if application of a specific semantics provides appropriate guarantees.

Optimization using Models: As discussed in the previous section, the workflow generation algorithms have high complexity. In a cluster of nodes, a single change may cause similar events from multiple nodes to be generated triggering multiple instances of the same rule. The workflow generation complexity can be greatly reduced if it can be inferred that the triggered rules are instances of a smaller set of rules by using configuration models. Similarly, using information models, such as Common Information Model (CIM) for representing system states enables faster evaluation of specifications. These models represent state information of entities and provide a central service that can be queried. Therefore, extending the adaptation framework with models enables optimization and we plan to explore that as extensions to our work.

Event Correlation Models: Finally, the epoch model that we have used approximates a system change in our prototype and in order to develop a fully functioning adaptation system an event correlator is necessary. The correlator would require system information such as configuration, event delivery latency and so on, which can be represented as system models.

Related Work

Automated change management has gained attention in the past few years as a necessary technology to address the adaptation needs of rapidly growing enterprise and grid computing markets. Several research projects are focusing on reducing the administrator efforts in managing different aspects of large distributed systems through policy and model-based approaches.

The CHAMPS project [18] aims to reduce the complexity of IT change and configuration management in distributed environments through planning and scheduling approaches. The project uses model-based approaches to build task graphs to adapt to system changes. Our adaptation framework uses policies and therefore differs significantly from the CHAMPS project. Policy-based management has been used for network switch management [5], managing content distribution networks [6] and distributed systems [7].

The focus of our work is on managing complex IT systems. As we have described, the complexity of these systems causes simultaneous activation of multiple policy rules, which have to be enforced in proper order. None of the projects on policy-based management seem to address this problem, to the best of our knowledge, as we have addressed.

There have been several research efforts in designing policy languages [1, 15], detecting and resolving policy conflicts [2, 9, 11, 22], and various other analyses [8, 20]. To the best of our knowledge, no research in this area has addressed the problem of ordering management rules and providing enforcement guarantees as we have addressed in this paper.

Dunlop, et al. [22] use temporal characteristics of policies to dynamically reason about policy consistency. Their approach detects a large class of conflicts that cannot be detected statically. The focus of their work is on conflict analysis and not on ordering rule enforcement as we have presented in this paper. Our previous work [11] proposes an ECA-P framework to detect and resolve dynamic conflicts that occur due to side-effects of actions. The focus of the work was on conflict analysis and not on enforcement order determination.

Sloman, et al. [7, 8, 9] have developed the Ponder policy specification language and defined techniques for conflict analysis and role-based management. To the best of our knowledge, their work does not address the problem of ordering concurrently triggered rules.

Several research projects in autonomic computing reason about action ordering [12, 13]. These projects are based on AI planning techniques where users specify high-level goals and the planning system determines the ordered set of actions to be executed to reach the desired goal state. The main difference between these projects and our work is that in goal-based approaches the final system state that needs to be reached is known and the system has to determine the actions to be executed to reach that state. In the problem that we have addressed, the final system state is unknown. When an event occurs, a set of rules get triggered and we need to reason about the execution order of the rule actions based on some enforcement semantics.

Finally, our application of policy-based techniques for adapting monitoring systems is a novel research effort. System monitoring is a well-researched field and several monitoring systems such as HP Openview [16], IRISLOG [25], Ganglia [14] and MonALISA [17] are currently being used. These systems focus mainly on data collection, delivery, scalability and fault tolerance. None of these systems support a generic framework that adapts to a spectrum of changes such as application migration, service plug-ins and so on, which is required for automated change management. Our framework uses the configurability features of these systems to adapt to changes based on administrator-specified policies.

Conclusion

IT systems are dynamic and subject to various changes. Management of such changes needs to be automated to reduce administration cost. Policy-based adaptation is a suitable approach where management actions are specified by an administrator, as Event-Condition-Action rules, for different changes in the system. The interdependence of components in modern IT systems causes several change events to be generated when a single change occurs triggering multiple rules. Since the order of enforcement of rules determines the system behavior, adaptation systems should reason about the enforcement order of the rules before initiating corrective actions. We found the ECA rule framework to be poorly suited for reasoning since it does not contain specifications of rule actions.

In this work, we introduced a new rule framework, called Event-Condition-Precondition-Action-Postcondition (ECPAP) that contains action specification for designing adaptation policies for IT systems. When multiple rules are simultaneously triggered on a change, the adaptation system uses the specifications to analyze dependencies between rule actions and generate a Petri net workflow that is executed by an execution engine. We introduce a new notion called enforcement semantics that provides guarantees about rule enforcement. We have used this framework for adapting monitoring systems to changes and presented its performance and advantages in this paper.

Acknowledgements

We would like to thank Martin Arlitt, Keith Farkas, and our shepherd John ``Rowan'' Littell, for their comments, which have helped improve the content and presentation of the paper. We also thank Rob Kolstad for his extra typesetting efforts that have helped to significantly improve the look and feel of the paper.

Author Biographies

Chetan Shankar is a doctoral candidate at the University of Illinois at Urbana-Champaign (UIUC). He is one of the main contributors to the Active Spaces pervasive computing project at UIUC and has published several papers on pervasive computing and policy-based management. He is broadly interested in services and systems management, programming and management frameworks for dynamic systems and pervasive computing.

Vanish Talwar is a researcher in the Enterprise Systems and Software Lab at Hewlett-Packard Laboratories. His technical interests include distributed systems, operating systems, and computer networks, with a focus on management technologies. He received his M.S. and Ph.D. degrees in computer science from the University of Illinois at Urbana Champaign (UIUC) in 2001 and 2006 respectively. He is the recipient of the David J. Kuck Best Masters Thesis award in the Dept. of Computer Science, UIUC, and is an elected member of Phi Kappa Phi and Sigma Xi.

Subu Iyer is a Systems Software Engineer and researcher at HP Labs, Palo Alto. He joined DEC Network Systems Lab in 1997 where he worked on collecting and analyzing performance data from a large cluster of machines on DEC's Palo Alto Research Gateway. Over the years, Subu has worked on projects in the areas of distributed computing, performance monitoring and telepresence. His current work is on scalable adaptive performance monitoring.

Yuan Chen is a post-doctoral researcher in Enterprise Systems and Software Laboratory at HP Labs. He received a B.S degree from University of Science and Technology of China in 1994, and M.S. and Ph.D. degrees from the Georgia Institute of Technology in 2001 and 2005, respectively, all in Computer Science. His current research focuses on performance and systems management in complex and large-scale enterprise computing systems.

Dejan Milojicic is a senior researcher and a project manager at HP Labs. He has worked in the area of operating systems and distributed systems for more than 20 years. He has been the program chair of the IEEE Agent Systems and Applications Symposium (ASA/MA'99) and of the first USENIX Workshop on Industrial Experiences with System Software (WIESS'2000). Dr. Milojicic published in many journals and at various events. He is currently on the editorial board of IEEE Distributed Systems Online. He has been engaged in various standardization bodies, such as OMG and Global Grid Forum. He is a member of the ACM, IEEE, and USENIX. He received his B.Sc. and M.Sc. from University of Belgrade and his Ph.D. from University of Kaiserslautern. Prior to HP Labs, Dejan worked at Institute ``Mihajlo Pupin,'' Belgrade and at OSF Research Institute, Cambridge, MA.

Roy Campbell is the Sohaib and Sara Abbasi Professor of Computer Science at the Siebel Center for Computer Science at the University of Illinois, Urbana-Champaign. He has supervised the completion of forty Ph.D. dissertations and the author of over two hundred and forty four research papers on security, programming languages, software engineering, operating systems, distributed systems, and networking. His research includes the Gaia project on Active Spaces, the security of the power grid, and mobile computer operating systems. Professor Campbell is Director of the University Of Illinois Center Of Academic Excellence in Information Assurance Education, a member of the Information Trust Institute and, with Guy Garnett, directs the Cultural Computing Program. He is a member of the ACM and an IEEE Fellow.

Bibliography

[1] Lobo, J., et al., ``A Policy Description Language,'' Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-99), pp. 291-298, July, 1999.
[2] Chomicki, J., J. Lobo, and S. Naqvi, ``Conflict Resolution Using Logic Programming,'' IEEE Transactions on Knowledge and Data Engineering, Vol. 15, Num. 1, pp. 244-249, January/ February, 2003.
[3] Reisig, W., Petri Nets: An Introduction, Springer-Verlag, New York, 1985.
[4] Roussev, B. N., ``Self-checking Implementation of Boolean Interpreted Petri Nets,'' Proceedings of IEEE Symposium on Emerging Technologies and Factory Automation, 1994.
[5] Bhatia, R., et al., ``Policy Evaluation for Network Management,'' Proceedings of 19th Annual Joint Conference of the IEEE Computer and Communication Societies (INFOCOM 2000), pp. 1107-1116, March, 2000.
[6] Amiri, K., et al., ``Policy Based Management of Content Distribution Networks,'' IEEE Network Magazine, 2002.
[7] Sloman, M., ``Policy Driven Management For Distributed Systems,'' Plenum Press Journal of Network and Systems Management, Vol 2, Num. 4, Dec., 1994, pp. 333-360.
[8] Lupu, E. C., A Role-Based Framework for Distributed Systems Management, Ph.D. Thesis, Imperial College, London, 1998.
[9] Lupu, E. C., et al., ``Conflicts in Policy-Based Distributed Systems Management,'' IEEE Transactions on Software Engineering, Vol. 25, pp. 852-869, Nov., 1999.
[10] Shankar, C., et al., ``A Policy-based Management Framework for Pervasive Systems using Axiomatized Rule Actions,'' Proceedings of Fourth IEEE International Symposium on Network Computing and Applications (IEEE NCA05), MA, 2005.
[11] Shankar, C., et al., ``An ECA-P Policy-based Framework for Managing Ubiquitous Computing Environments,'' Proceedings of The Third Annual International Conference on Mobile and Ubiquitous Systems: Networks and Services (Mobiquitous 2005), San Diego, July, 2005.
[12] Andrzejak, A., et al., ``FeedbackFlow - An Adaptive Workflow Generator for System Management,'' Proceedings of The Second IEEE International Conference on Autonomic Computing (ICAC-05), June, 2005.
[13] Srivastava, B., et al., ``The Case for Automated Planning in Autonomic Computing,'' Proceedings of The 2nd IEEE International Conference on Autonomic Computing (ICAC-05), June, 2005.
[14] Massie, M., B. Chun, and D. Culler, ``The Ganglia Distributed Monitoring System: Design, Implementation, and Experience,'' Parallel Computing, Vol. 30, Issue 7, July 2004.
[15] Damianou, N., et al., ``The Ponder Specification Language,'' Proceedings of Workshop on Policies for Distributed Systems and Networks (Policy2001), HP Labs Bristol, pp. 29-31, Jan., 2001.
[16] HP OpenView Management Solutions, https://www.managementsoftware.hp.com/.
[17] Newman, H. B., et al., ``MonALISA: A Distributed Monitoring Service Architecture,'' Proceedings of 2003 Conference for Computing in High Energy and Nuclear Physics (CHEP03), La Jolla, California, March, 2003.
[18] Brown, A., et al., ``A Model of Configuration Complexity and its Application to a Change Management System,'' Proceedings of the 9th International IFIP/IEEE Symposium on Integrated Management (IM 2005), May, 2005.
[19] XSB Logic Programming and Deductive Database system for UNIX and Windows, https://xsb.sourceforge.net/.
[20] Verma, D., ``Simplifying Network Administration using Policy based Management,'' IEEE Network Magazine, 2002.
[21] Baralis, E. and J. Widom, ``Better Static Rule Analysis for Active Database Systems,'' ACM Transactions on Database Systems, Vol. 25, Num. 3, pp. 269-332, September, 2000.
[22] Dunlop, N., et al., ``Dynamic Conflict Detection in Policy-Based Management Systems,'' Proceedings of Enterprise Distributed Object Computing Conference (EDOC '02), September, 2002.
[23] Hoare, C. A. R., ``An axiomatic Basis for Computer Programming,'' Communications of the ACM, Vol. 12, Num. 10, 1969.
[24] Ranganathan, A., et al., ``Pervasive Autonomic Computing Based on Planning,'' Proceedings of IEEE International Conference on Autonomic Computing (ICAC-04), May, 2004.
[25] Nath, S., et al., ``Tolerating Correlated Failures in Wide-Area Monitoring Services,'' Intel Research TR, May, 2004.
[26] Kliger, S., et al., ``A Coding Approach to Event Correlation,'' Proceedings of the 4th International IFIP/IEEE Symposium on Integrated Management (IM 1997), 1997.
[27] OpenView Event Correlation Service, https://www.managementsoftware.hp.com/products/ecs/.

Appendix I


Last changed: 15 Nov. 2006 ch