Second USENIX Conference on Object-Oriented Technologies (COOTS), 1996 [Technical Program]

Pp. 25–34 of the Proceedings

Building Independent Black Box Components in C++

Mark Addesso
Software AG of North America

Abstract

When using Object Oriented techniques to build systems, the relationships between objects create dependencies which inhibit reuse, unit testing and reliability. By using black box components, these relationships can be factored out of the object to create truly independent components. Techniques to design and build systems in this manner are presented. A complex system built with black box components is also described.

Introduction

The basic idea with object technology was to create packaged units of data and behavior. You could put your hands around an object, design and code it separately, test it as a unit. Once completed, this package of functionality could be reused and would never have to be coded again.

It sounds good in theory, but in practice it is very difficult to create such independent packages. The difficulty is that any complex behavior requires a group of collaborating objects. The boundaries of these packages get blurred as one object requires the services of another, or needs to query another object, or starts to control another object. When the boundaries get blurred, all the advantages of the packaged behavior and data blur as well. Objects which depend on others cannot be built and tested independently, and further, cannot be reused without including all the dependent objects.

These dependencies are inherent in current object design techniques. Most popular techniques let you model interactions between objects. Wirfs-Brock uses Collaboration Graphs [1], Rumbaugh uses Event Trace Diagrams [2], Jacobson uses Interaction Diagrams [3], etc. These interaction may be described as stimuli to another object, a contract between two objects, etc. Any direct interaction between objects creates a dependency. The object sending the message must contain a reference to the receiving object, and must know the exact protocol of the receiving message.

To reuse an object with dependencies in another application, the receiver must also be included, and the application must correctly establish the relationship between the two. Even with two objects, reuse becomes complicated. Given all message sends of an object, it may be dependent on many objects. Even worse, the dependent objects, can be dependent on others, etc., etc. Thus, the entire web of connected objects would have to be included and correctly initialized to be reused in another application. This is why currently only primitive classes (lists, widget, etc.) or large subsystems (OLE components) are good candidates for reuse.

Removing Dependencies

In order to find a solution to the dependency problem I imagined the ideal way to build and use objects. I went back to an early concept in object technology called Software ICs. This was first introduced by Brad Cox [4]. A software IC is like a hardware IC, it has inputs and outputs which are connected to other ICs. If we could build systems by connecting black box object outputs to inputs, we would have a very loosely coupled system which should have the properties I was looking for.

The first decision was to decide what were the black boxes: objects or methods? If a black box was an object, then an input would be a method and an output would be a message send to another object's method. If the black box was a method, then the inputs would be the method parameters and the output would be the method's output parameters. Methods as black boxes is similar to a data flow approach. Morrison[5] has done some interesting work with "Flow Based" programming which uses this approach. However, I was more interested in building and reusing objects, not methods, so I opted for the first approach - using objects as black boxes.

Objects as independent black box components have the following properties:

1) They maintains their own state. A component cannot rely on any other component for state information, as that would create a dependency.

2) A component cannot query another component. When an output of a component is connected to the input of another, the data flow is strictly in one direction.

3) A component can contain other components, but can only have one parent. This is consistent with the clear boundaries of a black box.

In respect to other component models (COM, CORBA, etc.), the component protocol described here is not intended to be a competing protocol model. Rather, the components described are building blocks used to design and implement software systems. They are more detailed, internal building blocks which could be used to implement an OLE or CORBA component, or to implement stand alone applications.

The API of a component for our purposes has three sections: inputs, outputs, and controls. Component inputs set the state of the component or pass data to the object to process. Outputs of a component are generated when the state of a component changes or as the result of an operation of the component. Controls are used by the parent object to set, query and control the sub component. Since the parent's behavior is implemented by its sub components, parents can have intimate knowledge of the sub components behavior.

Diagrams are used to document a parent's components, and how these components are connected. In the diagram, an object class is represented by a gray box. The input and output ports are displayed inside the class box, and are connected with directed lines. Each line is labeled with the parameters (if any), sent from the output port to the input port. A second list is used in the class box to show the control ports available to the parent.

Building Systems with Components

Given this definition of components, systems are built by parent components connecting the outputs and inputs of its sub components.

A parent can connect sub components for one of three purposes: to implement a sequence of messages, to broadcast a state change, or to pump the output stream of one component to the input of another.

o Sequence of Messages

This usage is similar to a data flow style of programming. Each component transforms its inputs to outputs and sends them to the next component in the chain.

This example is a parser which must tokenize its input and generate a parse tree. Here, a string is sent to the Tokenizer component's InputString port. It creates a token for the string and sends it out its NextToken output port which gets sent to the ParseTree component's AddToken port.

o Broadcast of a State Change

This usage is similar to the MVC (Model-View-Controller) architecture developed in Smalltalk. A change in a model (or business) object is broadcast to all its views which are then updated.

This example is from an application which lets the user look at and change customer records. Here, when the name is changed in the CustomerRecord component, the NameChanged output is triggered which sends the new name to the CustomerForm, which updates the screen. Note that the CustomerRecord would have a number of output ports for various state changes. Also note that this use is different from a typical MVC setup, because here the "View" object does not have to query the model for the data, the data is always passed. Since the view object doesn't query the model object, it is not dependent on the model object and its protocol. See [6] for more information on MVC.

o Pumping an Output Stream to Another Component

This usage is similar to piping under Unix and covers cases where a set of data is processed by a component and the results are sent to another component.

This example is from an application which produces reports given a list of records. Here, the RecordSet component contains a list of records which will be formatted by the ReportFormatter component. The SortedRecords output port pumps all the records in sorted order to the ProcessRecord input port of the

ReportFormatter component.

Given these three types of interactions, and the fact that a parent component can control its sub components, the question is: Can we build complex systems with just these constructs? The case study presented in the next section describes a complex system which was successfully implemented with these techniques.

Case Study - an ER Diagram Generator and Editor

The development of Software AG's query and reporting tool "Esperant" was used as a case study for these techniques. Esperant has two components: an administration tool to design the data view, and an end user query tool.

A DataView is a view of the database which is closer to the user's view of the data. Tables and columns can be renamed to more user friendly terms. Tables can be joined to create "denormalized" views of the data , which again is more user friendly.

To define the joins, an Entity Relationship diagram is used. An ER diagram contains table nodes with primary and foreign keys, table columns, and join lines which connect the primary to foreign keys. Figure 5 above shows an ER diagram for our sample database.

The join lines are "locked" to the tables. If a table is moved or resized, the join lines must be stretched, rerouted and possibly clipped to conform to the new position.

Since our Esperant users had existing DataViews, and since joins can be inferred from many database schemas, another requirement was to automatically generate the ER diagram from a list of tables and joins.

Thus the project was to build an ER diagram generator and editor, with standard graphic editing capabilities: selection handles, direct manipulation to move and resize tables and to reroute joins, and undo.

In designing the system using components, it was first decided to separate the automatic generation from the editing. The generation involves computing an optimal layout of the entities and then routing the joins between them.

Thus, the diagram subsystem consists of three major components: a LayoutGenerator, a Router, and a DiagramEditor.

The inputs to the subsystem accept new entities and joins. These inputs are connected to the inputs of all the sub components to update their respective lists (see figure 6).

The LayoutGenerator and Router components produce position information which is sent to the DiagramEditor to position the entity boxes and relationship lines. Note also that the LayoutGenerator sends node position information to the Router, as the router needs to know the absolute entity positions before routing can begin.

In the system, these components work as follows. The "back end" of the system retrieves a list of entities and relationships, either from an existing DataView or by reading the database catalog. The back end is connected to the Diagram SubSystem's AddEntity and AddRelationship input ports. The back end pumps all the new entities and relationships through these ports.

A control message is then sent by the Diagram SubSystems's parent to generate the diagram. The Diagram SubSystem triggers the generate_layout control in the LayoutGenerator. The generator

computes an optimal table layout and sends the table position information to both the Router and the DiagramEditor. The Diagram SubSystem then triggers the route_connections control in the router, which computes the line routing for all joins and sends the route info to the DiagramEditor (see figure 7).

The generator and router components use specialized algorithms and data structures to compute optimal placements and to generate routes with a minimum number of intersections. These algorithms were not well suited for components and were implemented using traditional objects and methods.

The DiagramEditor was very conducive to sub components. An editor is composed of many objects which are very state related. For example, the relationship lines are dependent on the entity positions; the selection handles for an entity or line are mutually dependent on the entity or line they are manipulating.

DiagramEditor Component

In the ER diagram, whenever an entity is moved or resized all connected lines must be stretched and clipped to the new position.

To model this behavior between the entities and the lines, each line maintains a start_box and an end_box of what it is connected to. The boxes are necessary for the line to intelligently update itself and to clip to the box whenever it is edited. The line has two input ports to update the positions of the start and end boxes.

Each entity has an output port to indicate that its box has changed. This is connected to the start_box or

end_box input ports of all connected lines. The figure below shows the components and connections used for one such relationship.

Thus when an entity is moved or resized, it sends its new position out the BoxChanged port. All connected lines receive the message and update their start or end box, and reroute and possible clip themselves to align to the new box position.

Entity and Line Selection

When an entity or line is selected, handles are displayed for the user to move, resize the entity, or reroute the line. The handles are implemented as separate components. They produce outputs when they are edited which are connected to inputs of the entities or lines they represent (see figure 9).

Given the component interactions of the DiagramEditor, whenever an object is changed (moved, resized, rerouted), many objects can be effected (handles update entities which update lines, etc.).

One sticky issue was how to keep track of an invalidation rectangle to redraw when the editing operation was complete. Since an output port cannot query data, one invalidation rectangle could not be built during the message broadcasting. The best solution was to create a separate component which maintains the invalidation rectangle for each transaction.

The DiagramEditor contains one InvalidationRectangle component which all entities, lines and handles are connected to (see figure 10). During a transaction, when an object changes its location or size, it generates an InvalRect output message. This is connected to the merge input port of the InvalidationRectangle which collects the rectangles for the transaction. When the transaction completes, the parent DiagramEditor signals the InvalidationRectangle to refresh the window.

A complete edit operation works as follows. The user interacts with a selection handle to drag or resize an object. During the interaction, the mouse messages are routed through the DiagramEditor to the handle or handles effected.

When the user completes the operation by releasing the mouse button, all the effected handles trigger their MovedBy or Resized output ports. These send either a delta or new box to the entity connected to the handle.

The entity updates its position and sends an invalidation rectangle (old position + new position) out its InvalRect port, which goes to the InvalidationRectangle component. It also sends its new box out its BoxChanged output port. This goes to all connected lines which update themselves and send their invalidation rectangles (old bounding box + new bounding box) out their InvalRect port which goes to the InvalidationRectangle component which merges the rectangle with all previous ones.

The DiagramEditor then triggers the InvalidationRectangles's invalidate control which does the actual window system invalidation.

To get a more complete picture of the design of the diagram editor, the above diagrams are combined and shown in figure 11. This represents the primary behavior of the sub components of the editor. In actuality, there are more ports and connections to handle more detailed behavior such as reordering join

keys, right mouse menus, drag and drop joining, etc.

These are not shown here for brevity, but the picture gives a very accurate overview of the design of the system.

Implementation of Components in C++

The API of a component has three sections: input ports, output ports and controls. Controls are simple. They are just implemented as methods on the component class.

To connect output ports to input ports, function pointers are used. An input port is a static member function on the component class (note that it must be static, since C++ does not support function pointers to member functions). This static member function calls the appropriate real member function of the component instance.

For example, the input port "MoveBy" on the DrawingNode component is implemented as follows:

class DrawingNode

{

static void MoveBy(void *self, int dx, int dy)

{ ((DrawingNode*)self)->_MoveBy(dx,dy);}

void _MoveBy(int dx, int dy);

};

The first argument of an input function is the component which receives the message. The recasting is necessary since we cannot use direct member function pointers. The recast is guaranteed to be correct based on the scheme of connecting inputs to outputs (see the following section about type correctness).

Each output port is implemented with a pointer array. We use Visual C++ with MFC and so use a CPtrArray object. The pointer array will contain a list of object/method pairs of all input ports connected to this output. For example, the output port "moved" is defined on the Handle component as follows:

class Handle

{

CPtrArray moved;

};

Macros are used to connect, disconnect and send data through a port. A connection is established with the macro:

Connect(SendingComponent, port, functionType,

receivingComponent, inputPort)

For example, to connect the "moved" output port of a Handle to the "movedBy" input port of a DrawingNode the command would be:

Connect(aHandle, Moved, MovedFunc,

aDrawingNode, MovedBy)

where MovedFunc is a typedef which declares the output port parameters and is defined as:

typedef void (*MovedFunc)(void *receiver,

int dx, int dy);

The connect macro adds the receiver and input function to the Moved output port list. Here, a pointer to the DrawingNode and the input port function "MovedBy" is added to the Moved list of the Handle object.

Data is sent from an output to an input via a Send macro. There are different macros depending on the number of arguments in the message. For example, when the Handle is moved, it will send a message to all its connected components with the macro:

Send2(Moved, MovedFunc, dx, dy);

This macro iterates through all components connected to the Moved output port and sends the two arguments. The MovedFunc typedef is used to ensure the correct number and type of arguments are used for the port.

Type Correctness

To avoid errors and take advantage of the strong typing of C++, we must ensure when data is sent from an output port to an input port, that the receiver is correct and the arguments are correct. Since connecting the output port to the input port is done in the parent component, and the sending is done in the sub component, the Connect and Send macros must work together to ensure correct types.

This is achieved by using function prototypes for all output ports. By convention, each output port has a corresponding function prototype which declares all parameters of the port. The prototype name is the name of the output port concatenated with the string "Func".

For example, in the API of the Handle component, there is a function prototype called MovedFunc declared as follows:

typedef void (*MovedFunc)(void *receiver, int dx, int dy)

This function prototype is used by both the Connect and the Send macros to ensure that a) the output port is connected to an input port with compatible arguments, and b) when data is sent out the output port, the data sent is the right type for the port. This is illustrated in the definition of the Connect and Send macros which are defined as follows:

#define Connect(sender, outputPort, receiver, inputPort)

{ sender->outputPort.Add(receiver);

outputPort##Func func = receiver->inputPort;

sender->outputPort.Add((void*)func); }

Note the use of the `##' concatenate operator to create the function prototype name from the output port.

So the example:

Connect(aHandle, Moved, MovedFunc,

aDrawingNode, MovedBy)

expands to:

aHandle->Moved.Add(aDrawingNode);

MovedFunc func = aDrawingNode->MovedBy;

aHandle->Moved.Add((void*)func);

The assignment to the func variable will cause the compiler to correctly match the parameters or the DrawingNode::MovedBy static member function with the MovedFunc prototype.

The Send macro uses a similar technique for typing and is implemented as follows:

#define Send2(outputPort, arg1, arg2)

{ for (int I = 0; I < outputPort.GetSize(); I++)

{ outputPort##Func func =

(outputPort##Func)outputPort[I+1];

(*func)(outputPort[I], arg1, arg2);}

}

So, the example:

Send2(Moved, dx, dy)

would expand to:

for (int I = 0; I < Moved.GetSize(); I++)

{ MovedFunc func = (MovedFunc)Moved[I+1];

(*func)(Moved[I], dx, dy);}

Here, on the jump thru the function pointer (func), the compiler will verify that the arguments specified in the Send macro match those in the function prototype.

Thus we have verified that the input port parameters match the function prototype, and that the send parameters match the function prototype. Therefore we are guaranteed that the input port parameters match the send parameters.

The other issue to address is the recasting of the void* pointer to the component class in the static member function.

In the Connect macro, the static member function pointer is retrieved by using "receiver->InputPort". The compiler will ensure that the static function for the receiver class is used.

By saving the receiver in the output port list and sending the receiver later to the static member function we are in effect doing the equivalent of:

receiver->inputPort(receiver);

Thus, we are guaranteed that the object instance sent to the class method will always be a "receiver" class object.

Conclusions

The goal of this work was to find a "divide and conquer" strategy to designing object oriented systems, such that pieces could be built and tested independently. By building independent components, we could reduce the exponential growth of complexity as the system grows, and build complex systems by combining smaller, simpler sub systems.

The technique presented was successful for the project described and achieved these goals. All components had very clear boundaries and could be built and tested independently. This is not to say that each component was trivial to implement. The DiagramEditor is a very complex component which manages complex behavior between many subcomponents. By using the connection strategy described, we created the basic architecture for the system. But these connections cannot model all the behavior, therefore much of it is implemented by the parent DiagramEditor. Though the DiagramEditor code can get complex, the good news is that no matter how complex a component is from the inside, this is totally hidden from the outside where it behaves in a well defined way.

Another benefit of the approach was that supporting Undo in the diagram editor was much easier. Since each component is autonomous, each could maintain its own undo stack. The undo feature was then distributed among all the classes and was very straight forward to implement and to debug. Another section of the product did not use components and implemented undo with a Momento pattern (see [7] for the definition of this pattern). The non component approach was more difficult and required more re design and re implementation than the distributed component undo technique.

The component undo strategy where each component is responsible for its own state, is akin to an object serialization strategy where each object reads and writes itself to a persistent store. It is very clean and produces a straight forward implementation which holds up better to change, as it is more localized.

Some of the components used in this project are being reused in other ones. These techniques have made it easier to identify what objects are required to reuse a component ( all supertypes of the component and all its sub components). Also, the loose coupling via the input/output ports makes it easy to work a component into other architectures.

Can this approach be applied to all design problems? There are some cases we are trying to apply it to which are difficult. For example, we have a view in our product which is a scrolling list of records. The records are stored in a RecordSet object and are displayed by one or more view objects. How can the view scroll without querying the RecordSet, because querying other objects is not allowed?

We haven't resolved this, but the approach does not have to be used everywhere and can work in conjunction with other techniques. As mentioned in our Diagram Subsystem example, some components were built using standard objects and methods, and had many dependencies within the component. This was shielded from the rest of the system by wrapping these dependent objects inside a component with a clean API.

The real advantage to this approach is that it adds more structure to object design. Classes, methods and inheritance provide structure which has helped in building, understanding and maintaining systems. The additional structure of limiting relationships and defining Inputs, Outputs and Controls for each component does add more restrictions, but these rules provide clear and standard ways to design and document object behavior. To understand a component's external API one has to look at its inputs, outputs and controls. To understand an object's internals, one has to look only at that object and all its sub components. This is a tremendous advantage over other approaches which allow object dependencies, and in our experience has produced a more reliable system of reusable parts which is easier to understand, enhance and maintain.

References

[1] Wirfs-Brock, Wilkerson, and Wiener 1990, Designing Object-Oriented Software, Prentice Hall

[2] Rumbaugh et al., 1991, Object-Oriented Modeling and Design, Prentice Hall

[3] Jacobson, I., Christerson, M., Jonsson, P. and Overgaard, G. 1992, Object-Oriented Software Engineering, Addison-Wesley

[4] Cox, B., 1986, Object Oriented Programming, An Evolutionary Approach, Addison Wesley

[5] Morrison, J., 1994, Flow-Based Programming, Van Nostrand Reinhold

[6] Lewis, Simon, 1995, The Art and Science of Smalltalk, Prentice Hall

[7] Gamma et al., 1995, Design Patterns - Elements of Reusable Object-Oriented Software, Addison Wesley

This paper was originally published in the Proceedings of the Second USENIX Conference on Object-Oriented Technologies (COOTS), June 16-20, 1997, Portland, Oregon, USA
Last changed: 9 Jan 2003 aw

Technical Program

Conference Index

USENIX home

Building Independent Black Box Components in C++

Mark Addesso Software AG of North America

Mark Addesso
Software AG of North America