Second USENIX Conference on Object-Oriented Technologies (COOTS), 1996
   
[Technical Program]
Pp. 2534 of the Proceedings | |
Building Independent Black Box Components in C++
Mark Addesso
Software AG of North America
Abstract
When using Object Oriented techniques to build systems, the relationships
between objects create dependencies which inhibit reuse, unit testing and
reliability. By using black box components, these relationships can be factored
out of the object to create truly independent components. Techniques to design
and build systems in this manner are presented. A complex system built with
black box components is also described.
Introduction
The basic idea with object technology was to create packaged units of
data and behavior. You could put your hands around an object, design and code
it separately, test it as a unit. Once completed, this package of functionality
could be reused and would never have to be coded again.
It sounds good in theory, but in practice it is very difficult to create such
independent packages. The difficulty is that any complex behavior requires a
group of collaborating objects. The boundaries of these packages get blurred as
one object requires the services of another, or needs to query another object,
or starts to control another object. When the boundaries get blurred, all the
advantages of the packaged behavior and data blur as well. Objects which depend
on others cannot be built and tested independently, and further, cannot be
reused without including all the dependent objects.
These dependencies are inherent in current object design techniques. Most
popular techniques let you model interactions between objects. Wirfs-Brock uses
Collaboration Graphs [1], Rumbaugh uses Event Trace Diagrams [2], Jacobson uses
Interaction Diagrams [3], etc. These interaction may be described as stimuli to
another object, a contract between two objects, etc. Any direct interaction
between objects creates a dependency. The object sending the message must
contain a reference to the receiving object, and must know the exact protocol
of the receiving message.
To reuse an object with dependencies in another application, the receiver must
also be included, and the application must correctly establish the relationship
between the two. Even with two objects, reuse becomes complicated. Given all
message sends of an object, it may be dependent on many objects. Even worse,
the dependent objects, can be dependent on others, etc., etc. Thus, the entire
web of connected objects would have to be included and correctly initialized to
be reused in another application. This is why currently only primitive classes
(lists, widget, etc.) or large subsystems (OLE components) are good candidates
for reuse.
Removing Dependencies
In order to find a solution to the dependency problem I imagined the ideal way
to build and use objects. I went back to an early concept in object technology
called Software ICs. This was first introduced by Brad Cox [4]. A software IC
is like a hardware IC, it has inputs and outputs which are connected to other
ICs. If we could build systems by connecting black box object outputs to
inputs, we would have a very loosely coupled system which should have the
properties I was looking for.
The first decision was to decide what were the black boxes: objects or methods?
If a black box was an object, then an input would be a method and an output
would be a message send to another object's method. If the black box was a
method, then the inputs would be the method parameters and the output would be
the method's output parameters. Methods as black boxes is similar to a data
flow approach. Morrison[5] has done some interesting work with "Flow Based"
programming which uses this approach. However, I was more interested in
building and reusing objects, not methods, so I opted for the first approach -
using objects as black boxes.
Objects as independent black box components have the following properties:
1) They maintains their own state. A component cannot rely on any other
component for state information, as that would create a dependency.
2) A component cannot query another component. When an output of a component is
connected to the input of another, the data flow is strictly in one
direction.
3) A component can contain other components, but can only have one parent. This
is consistent with the clear boundaries of a black box.
In respect to other component models (COM, CORBA, etc.), the component protocol
described here is not intended to be a competing protocol model. Rather, the
components described are building blocks used to design and implement software
systems. They are more detailed, internal building blocks which could be used
to implement an OLE or CORBA component, or to implement stand alone
applications.
The API of a component for our purposes has three sections: inputs, outputs,
and controls. Component inputs set the state of the component or pass data to
the object to process. Outputs of a component are generated when the state of a
component changes or as the result of an operation of the component. Controls
are used by the parent object to set, query and control the sub component.
Since the parent's behavior is implemented by its sub components, parents can
have intimate knowledge of the sub components behavior.
Diagrams are used to document a parent's components, and how these components
are connected. In the diagram, an object class is represented by a gray box.
The input and output ports are displayed inside the class box, and are
connected with directed lines. Each line is labeled with the parameters (if
any), sent from the output port to the input port. A second list is used in the
class box to show the control ports available to the parent.
Building Systems with Components
Given this definition of components, systems are built by parent components
connecting the outputs and inputs of its sub components.
A parent can connect sub components for one of three purposes: to implement a
sequence of messages, to broadcast a state change, or to pump the output stream
of one component to the input of another.
o Sequence of Messages
This usage is similar to a data flow style of programming. Each component
transforms its inputs to outputs and sends them to the next component in the
chain.
This example is a parser which must tokenize its input and generate a parse
tree. Here, a string is sent to the Tokenizer component's InputString port. It
creates a token for the string and sends it out its NextToken output port which
gets sent to the ParseTree component's AddToken port.
o Broadcast of a State Change
This usage is similar to the MVC (Model-View-Controller) architecture developed
in Smalltalk. A change in a model (or business) object is broadcast to all its
views which are then updated.
This example is from an application which lets the user look at and change
customer records. Here, when the name is changed in the CustomerRecord
component, the NameChanged output is triggered which sends the new name to the
CustomerForm, which updates the screen. Note that the CustomerRecord would have
a number of output ports for various state changes. Also note that this use is
different from a typical MVC setup, because here the "View" object does not
have to query the model for the data, the data is always passed. Since the view
object doesn't query the model object, it is not dependent on the model object
and its protocol. See [6] for more information on MVC.
o Pumping an Output Stream to Another Component
This usage is similar to piping under Unix and covers cases where a set of data
is processed by a component and the results are sent to another component.
This example is from an application which produces reports given a list of
records. Here, the RecordSet component contains a list of records which will be
formatted by the ReportFormatter component. The SortedRecords output port pumps
all the records in sorted order to the ProcessRecord input port of the
ReportFormatter component.
Given these three types of interactions, and the fact that a parent component
can control its sub components, the question is: Can we build complex systems
with just these constructs? The case study presented in the next section
describes a complex system which was successfully implemented with these
techniques.
Case Study - an ER Diagram Generator and Editor
The development of Software AG's query and reporting tool "Esperant" was used
as a case study for these techniques. Esperant has two components: an
administration tool to design the data view, and an end user query tool.
A DataView is a view of the database which is closer to the user's view of the
data. Tables and columns can be renamed to more user friendly terms. Tables can
be joined to create "denormalized" views of the data , which again is more user
friendly.
To define the joins, an Entity Relationship diagram is used. An ER diagram
contains table nodes with primary and foreign keys, table columns, and join
lines which connect the primary to foreign keys. Figure 5 above shows an ER
diagram for our sample database.
The join lines are "locked" to the tables. If a table is moved or resized, the
join lines must be stretched, rerouted and possibly clipped to conform to the
new position.
Since our Esperant users had existing DataViews, and since joins can be
inferred from many database schemas, another requirement was to automatically
generate the ER diagram from a list of tables and joins.
Thus the project was to build an ER diagram generator and editor, with standard
graphic editing capabilities: selection handles, direct manipulation to move
and resize tables and to reroute joins, and undo.
In designing the system using components, it was first decided to separate the
automatic generation from the editing. The generation involves computing an
optimal layout of the entities and then routing the joins between them.
Thus, the diagram subsystem consists of three major components: a
LayoutGenerator, a Router, and a DiagramEditor.
The inputs to the subsystem accept new entities and joins. These inputs are
connected to the inputs of all the sub components to update their respective
lists (see figure 6).
The LayoutGenerator and Router components produce position information which is
sent to the DiagramEditor to position the entity boxes and relationship lines.
Note also that the LayoutGenerator sends node position information to the
Router, as the router needs to know the absolute entity positions before
routing can begin.
In the system, these components work as follows. The "back end" of the system
retrieves a list of entities and relationships, either from an existing
DataView or by reading the database catalog. The back end is connected to the
Diagram SubSystem's AddEntity and AddRelationship input ports. The back end
pumps all the new entities and relationships through these ports.
A control message is then sent by the Diagram SubSystems's parent to generate
the diagram. The Diagram SubSystem triggers the generate_layout control in the
LayoutGenerator. The generator
computes an optimal table layout and sends the table position information to
both the Router and the DiagramEditor. The Diagram SubSystem then triggers the
route_connections control in the router, which computes the line routing for
all joins and sends the route info to the DiagramEditor (see figure 7).
The generator and router components use specialized algorithms and data
structures to compute optimal placements and to generate routes with a minimum
number of intersections. These algorithms were not well suited for components
and were implemented using traditional objects and methods.
The DiagramEditor was very conducive to sub components. An editor is composed
of many objects which are very state related. For example, the relationship
lines are dependent on the entity positions; the selection handles for an
entity or line are mutually dependent on the entity or line they are
manipulating.
DiagramEditor Component
In the ER diagram, whenever an entity is moved or resized all connected lines
must be stretched and clipped to the new position.
To model this behavior between the entities and the lines, each line maintains
a start_box and an end_box of what it is connected to. The boxes are necessary
for the line to intelligently update itself and to clip to the box whenever it
is edited. The line has two input ports to update the positions of the start
and end boxes.
Each entity has an output port to indicate that its box has changed. This is
connected to the start_box or
end_box input ports of all connected lines. The figure below shows the
components and connections used for one such relationship.
Thus when an entity is moved or resized, it sends its new position out the
BoxChanged port. All connected lines receive the message and update their start
or end box, and reroute and possible clip themselves to align to the new box
position.
Entity and Line Selection
When an entity or line is selected, handles are displayed for the user to move,
resize the entity, or reroute the line. The handles are implemented as separate
components. They produce outputs when they are edited which are connected to
inputs of the entities or lines they represent (see figure 9).
Given the component interactions of the DiagramEditor, whenever an object is
changed (moved, resized, rerouted), many objects can be effected (handles
update entities which update lines, etc.).
One sticky issue was how to keep track of an invalidation rectangle to redraw
when the editing operation was complete. Since an output port cannot query
data, one invalidation rectangle could not be built during the message
broadcasting. The best solution was to create a separate component which
maintains the invalidation rectangle for each transaction.
The DiagramEditor contains one InvalidationRectangle component which all
entities, lines and handles are connected to (see figure 10). During a
transaction, when an object changes its location or size, it generates an
InvalRect output message. This is connected to the merge input port of the
InvalidationRectangle which collects the rectangles for the transaction. When
the transaction completes, the parent DiagramEditor signals the
InvalidationRectangle to refresh the window.
A complete edit operation works as follows. The user interacts with a selection
handle to drag or resize an object. During the interaction, the mouse messages
are routed through the DiagramEditor to the handle or handles effected.
When the user completes the operation by releasing the mouse button, all the
effected handles trigger their MovedBy or Resized output ports. These send
either a delta or new box to the entity connected to the handle.
The entity updates its position and sends an invalidation rectangle (old
position + new position) out its InvalRect port, which goes to the
InvalidationRectangle component. It also sends its new box out its BoxChanged
output port. This goes to all connected lines which update themselves and send
their invalidation rectangles (old bounding box + new bounding box) out their
InvalRect port which goes to the InvalidationRectangle component which merges
the rectangle with all previous ones.
The DiagramEditor then triggers the InvalidationRectangles's invalidate control
which does the actual window system invalidation.
To get a more complete picture of the design of the diagram editor, the above
diagrams are combined and shown in figure 11. This represents the primary
behavior of the sub components of the editor. In actuality, there are more
ports and connections to handle more detailed behavior such as reordering join
keys, right mouse menus, drag and drop joining, etc.
These are not shown here for brevity, but the picture gives a very accurate
overview of the design of the system.
Implementation of Components in C++
The API of a component has three sections: input ports, output ports and
controls. Controls are simple. They are just implemented as methods on the
component class.
To connect output ports to input ports, function pointers are used. An input
port is a static member function on the component class (note that it must be
static, since C++ does not support function pointers to member functions). This
static member function calls the appropriate real member function of the
component instance.
For example, the input port "MoveBy" on the DrawingNode component is
implemented as follows:
class DrawingNode
{
static void MoveBy(void *self, int dx, int dy)
{ ((DrawingNode*)self)->_MoveBy(dx,dy);}
void _MoveBy(int dx, int dy);
};
The first argument of an input function is the component which receives the
message. The recasting is necessary since we cannot use direct member function
pointers. The recast is guaranteed to be correct based on the scheme of
connecting inputs to outputs (see the following section about type
correctness).
Each output port is implemented with a pointer array. We use Visual C++ with
MFC and so use a CPtrArray object. The pointer array will contain a list of
object/method pairs of all input ports connected to this output. For example,
the output port "moved" is defined on the Handle component as follows:
class Handle
{
CPtrArray moved;
};
Macros are used to connect, disconnect and send data through a port. A
connection is established with the macro:
Connect(SendingComponent, port, functionType,
receivingComponent, inputPort)
For example, to connect the "moved" output port of a Handle to the "movedBy"
input port of a DrawingNode the command would be:
Connect(aHandle, Moved, MovedFunc,
aDrawingNode, MovedBy)
where MovedFunc is a typedef which declares the output port parameters and is
defined as:
typedef void (*MovedFunc)(void *receiver,
int dx, int dy);
The connect macro adds the receiver and input function to the Moved output port
list. Here, a pointer to the DrawingNode and the input port function "MovedBy"
is added to the Moved list of the Handle object.
Data is sent from an output to an input via a Send macro. There are different
macros depending on the number of arguments in the message. For example, when
the Handle is moved, it will send a message to all its connected components
with the macro:
Send2(Moved, MovedFunc, dx, dy);
This macro iterates through all components connected to the Moved output port
and sends the two arguments. The MovedFunc typedef is used to ensure the
correct number and type of arguments are used for the port.
Type Correctness
To avoid errors and take advantage of the strong typing of C++, we must ensure
when data is sent from an output port to an input port, that the receiver is
correct and the arguments are correct. Since connecting the output port to the
input port is done in the parent component, and the sending is done in the sub
component, the Connect and Send macros must work together to ensure correct
types.
This is achieved by using function prototypes for all output ports. By
convention, each output port has a corresponding function prototype which
declares all parameters of the port. The prototype name is the name of the
output port concatenated with the string "Func".
For example, in the API of the Handle component, there is a function prototype
called MovedFunc declared as follows:
typedef void (*MovedFunc)(void *receiver, int dx, int dy)
This function prototype is used by both the Connect and the Send macros to
ensure that a) the output port is connected to an input port with compatible
arguments, and b) when data is sent out the output port, the data sent is the
right type for the port. This is illustrated in the definition of the Connect
and Send macros which are defined as follows:
#define Connect(sender, outputPort, receiver, inputPort)
{ sender->outputPort.Add(receiver);
outputPort##Func func = receiver->inputPort;
sender->outputPort.Add((void*)func); }
Note the use of the `##' concatenate operator to create the function prototype
name from the output port.
So the example:
Connect(aHandle, Moved, MovedFunc,
aDrawingNode, MovedBy)
expands to:
aHandle->Moved.Add(aDrawingNode);
MovedFunc func = aDrawingNode->MovedBy;
aHandle->Moved.Add((void*)func);
The assignment to the func variable will cause the compiler to correctly match
the parameters or the DrawingNode::MovedBy static member function with the
MovedFunc prototype.
The Send macro uses a similar technique for typing and is implemented as
follows:
#define Send2(outputPort, arg1, arg2)
{ for (int I = 0; I < outputPort.GetSize(); I++)
{ outputPort##Func func =
(outputPort##Func)outputPort[I+1];
(*func)(outputPort[I], arg1, arg2);}
}
So, the example:
Send2(Moved, dx, dy)
would expand to:
for (int I = 0; I < Moved.GetSize(); I++)
{ MovedFunc func = (MovedFunc)Moved[I+1];
(*func)(Moved[I], dx, dy);}
Here, on the jump thru the function pointer (func), the compiler will verify
that the arguments specified in the Send macro match those in the function
prototype.
Thus we have verified that the input port parameters match the function
prototype, and that the send parameters match the function prototype. Therefore
we are guaranteed that the input port parameters match the send parameters.
The other issue to address is the recasting of the void* pointer to the
component class in the static member function.
In the Connect macro, the static member function pointer is retrieved by using
"receiver->InputPort". The compiler will ensure that the static function for
the receiver class is used.
By saving the receiver in the output port list and sending the receiver later
to the static member function we are in effect doing the equivalent of:
receiver->inputPort(receiver);
Thus, we are guaranteed that the object instance sent to the class method will
always be a "receiver" class object.
Conclusions
The goal of this work was to find a "divide and conquer" strategy to designing
object oriented systems, such that pieces could be built and tested
independently. By building independent components, we could reduce the
exponential growth of complexity as the system grows, and build complex systems
by combining smaller, simpler sub systems.
The technique presented was successful for the project described and achieved
these goals. All components had very clear boundaries and could be built and
tested independently. This is not to say that each component was trivial to
implement. The DiagramEditor is a very complex component which manages complex
behavior between many subcomponents. By using the connection strategy
described, we created the basic architecture for the system. But these
connections cannot model all the behavior, therefore much of it is implemented
by the parent DiagramEditor. Though the DiagramEditor code can get complex, the
good news is that no matter how complex a component is from the inside, this is
totally hidden from the outside where it behaves in a well defined way.
Another benefit of the approach was that supporting Undo in the diagram editor
was much easier. Since each component is autonomous, each could maintain its
own undo stack. The undo feature was then distributed among all the classes and
was very straight forward to implement and to debug. Another section of the
product did not use components and implemented undo with a Momento pattern (see
[7] for the definition of this pattern). The non component approach was more
difficult and required more re design and re implementation than the
distributed component undo technique.
The component undo strategy where each component is responsible for its own
state, is akin to an object serialization strategy where each object reads and
writes itself to a persistent store. It is very clean and produces a straight
forward implementation which holds up better to change, as it is more
localized.
Some of the components used in this project are being reused in other ones.
These techniques have made it easier to identify what objects are required to
reuse a component ( all supertypes of the component and all its sub
components). Also, the loose coupling via the input/output ports makes it easy
to work a component into other architectures.
Can this approach be applied to all design problems? There are some cases we
are trying to apply it to which are difficult. For example, we have a view in
our product which is a scrolling list of records. The records are stored in a
RecordSet object and are displayed by one or more view objects. How can the
view scroll without querying the RecordSet, because querying other objects is
not allowed?
We haven't resolved this, but the approach does not have to be used everywhere
and can work in conjunction with other techniques. As mentioned in our Diagram
Subsystem example, some components were built using standard objects and
methods, and had many dependencies within the component. This was shielded from
the rest of the system by wrapping these dependent objects inside a component
with a clean API.
The real advantage to this approach is that it adds more structure to object
design. Classes, methods and inheritance provide structure which has helped in
building, understanding and maintaining systems. The additional structure of
limiting relationships and defining Inputs, Outputs and Controls for each
component does add more restrictions, but these rules provide clear and
standard ways to design and document object behavior. To understand a
component's external API one has to look at its inputs, outputs and controls.
To understand an object's internals, one has to look only at that object
and all its sub components. This is a tremendous advantage over other
approaches which allow object dependencies, and in our experience has produced
a more reliable system of reusable parts which is easier to understand, enhance
and maintain.
References
[1] Wirfs-Brock, Wilkerson, and Wiener 1990, Designing Object-Oriented
Software, Prentice Hall
[2] Rumbaugh et al., 1991, Object-Oriented Modeling and Design, Prentice
Hall
[3] Jacobson, I., Christerson, M., Jonsson, P. and Overgaard, G. 1992,
Object-Oriented Software Engineering, Addison-Wesley
[4] Cox, B., 1986, Object Oriented Programming, An Evolutionary
Approach, Addison Wesley
[5] Morrison, J., 1994, Flow-Based Programming, Van Nostrand Reinhold
[6] Lewis, Simon, 1995, The Art and Science of Smalltalk, Prentice
Hall
[7] Gamma et al., 1995, Design Patterns - Elements of Reusable
Object-Oriented Software, Addison Wesley
|