Notes on data flow engine designThis article is about data flow engines design issues. A dataflow engine is the software that handles scheduling the transmission of messages (information packets) between autons (components), maintenance of the data structures holding messages, and the API between the component software and the data flow engine. I take the view that dataflow engine design naturally separates into two distinct components, inter-thread and intra-thread communication, each with their own implementation issues. The focus in this article is on scheduling issues for engines that run within a single thread of control. From the perspective of the scheduler autons (components) are a set of inports (input ports) and outports (output ports). Outports emit packages. A package has a wrapping and a packet. The wrapping contains the package address, an (inport,auton) pair, and sundry metadata. The packet contains metadata and either an document (FBP – an information packet) or a data value.
Packet contents are passed to input response code associated with
the inport. The input response code processes the packet. In
turn it can emit new packages to be delivered at a later date by
the scheduler. Once packet processing is complete control
returns to the scheduler. In turn the scheduler selects the next
package to be delivered.
Scheduling constraints“Flow Based Programming” mentions three general constraints. To these I have added two more. They are: (FBP) The flow constraint: Basically this says a package cannot be delivered until it has been sent. This may seem obvious but it implies that data flows follow a flow diagram. (FBP) The one place at a time constraint: In FBP an information packet can only be in one place at a time – either in a queue, within another IP, or within an auton. I distinguish between documents and data; Documents are objects that can only be in one place at a time. Data are not objects; they are only values. (FBP) The consistent ordering constraint: The general statement is that objects must be processed in the same order that they are sent. The ordering constraint implies that packages must be stored in queues before they are delivered. There are three variants of this constraint, inport order, auton order, and nest order. Inport order: The inport queue (or its image in a larger queue) preserves the order in which the packages were sent.(RH) The delivery must be attempted constraint: An attempt must be made to deliver every package; either it must ultimately be delivered or rejected as undeliverable.
(RH) Execution is divided into epochs. Packages emitted within
one epoch are not delivered until the next epoch. This is the
default; it can be over ridden.
When a delivery is made there are three major possible responses. The first is that one package is delivered. The second is that all packages in the inport’s queue of packages are processed. The third is that all packages in the auton’s inports are processed. Which of these strategies is used depends on the auton code structure.
Relationship of styles and constraints
We cannot arbitrarily mix style type and order constraint type.
The nest-order constraint requires that single-package processing
be used. The auton-order constraint requires that either
single-package processing or auton-order processing be used.
Finally, the inport-order constraint is consistent with all three
processing styles. In other words, the inport-order constraint
is the weakest constraint, auton-order is intermediate, and
nest-order is the strongest constraint.
In simulations it is useful to establish a “clock”. The simulation has a sequence of state and input pairs, <S,I> that represent the system state and its inputs at the beginning of a clock tick. In the interval between one tick and the next the system generates a set of outputs in response to its inputs. In turn the outputs become the inputs for the next tick. Symbolically <S0,I> => M => <S1,M>Regardless of the constraint choice or the processing style we can always maintain a transition boundary by having a separate output queue. All packages generated during an interval are attached to the output queue. When the transition boundary is reached the packages from the output queue become the new input queue. An output queue is necessary if inport or auton queue processing is used. It is not necessary if nest-order is used; however there is no advantage in not having an output queue.
Having a transition boundary is a constraint. It says that
packages emitted during one clock tick are not delivered until
the next clock tick.
Order constraints dictate using queues; however inport-order and auton-order constraints do not fully specify the order of execution. Here are two alternatives.
Using a master queue, advantages and disadvantagesMany implementations maintain a separate queue for each auton/inport. Using a master queue arguably is both more efficient in storage use and arguably provides better system integrity. The reason is quite simple. When separate queues are used space must be allocated for the maximum number of packages expected to be in each queue. When a master queue is used we need only allocate space for the maximum number of packages in the system. Obviously using a master queue will require less storage, though whether the savings are significant depends both on the implementation and the application. Configuring applications is simpler when just the master queue must be sized than when each separate queue has to be sized.
Using a single master queue simplifies design because the need
for “suspend on send” blocking is obviated. The root cause for
flooded queues is processes that produce packets at a rate
greater than the rate that the rest of the system can handle. The
cure is to bound the output rate of said processes.
In stack based scheduling the scheduler pushes emitted autons
onto a stack. When it selects the next package to be delivered
it pops one off the stack. Stack based scheduling is efficient
in its use of space; however it can violate order constraints.
For example, suppose auton X sends packages to Y and Z in that
order. The scheduler delivers a package to Z which in turn sends
packages to W and Y. The stack now contains a package for Y, one
for W, and a second, later one for Y. The scheduler will now
deliver the later one before the earlier one.
Another problem with stack based scheduling is that the
application can get trapped in an infinite loop cycling through
part of the network.
None-the-less there are situations in which restricted stack
based scheduling can be useful.
One such is “follow the packet scheduling”. This can be used
when all packets are documents (FBP: Information packets). The
idea is that when an IP enters an auton it only has one path out;
therefore it can be followed until it is discarded or otherwise
exits the system. This strategy requires keeping a list of all
documents in the system.
A second situation is recursion. Stack based scheduling is
appropriate when recursion is effected by spawning subsidiary
autons.
In simulations we may want to defer delivery of a
package until some later time, either in simulation pseudo-time
or in real time. A common way to handle deferred delivery is to
build a deferred delivery heap into the scheduler.
An alternative is to use a auton that receives a package that
contains a message and a send time. The auton responds by doing
a time check on each clock tick. When the send time it reached
it packages the message and sends it on.
When each auton or each inport has its own separate queue it is
fairly easy to create blockages by filling queues. Packages
cannot be sent to the filled queues. If the sending auton does
not take action to handle the failure to send it becomes blocked
with a “suspend on send”. The auton remains suspended until the
blocked queue is opened up. This condition can propagate
upstream. There is a fair amount of discussion in “Flow Based
Programming” about dealing with this condition.
There also is “suspend on receive”. There are two cases to
consider. One is normal; it occurs when the input queue is empty.
The other is algorithmic; the code within the auton places a
block on an inport that is not lifted until some condition is
met.
The situation is quite different when there is a single master
queue. There are no “suspend on send” blockages. It is possible
for processes to emit excessive packages and thereby flood the
master queue. This is easily detected and it is quite
straightforward to fix.
Normally the only blockages that are seen in the single master
queue scenario are blocks placed on an inport by the auton for
algorithmic reasons. For example, suppose an auton is
interleaving inputs from ports A and B. When it reads data from
port A we block it so that the next input must come from port B.
When an inport is blocked it is convenient for its queue to be
separate from the master queue.
This page was last updated April 1, 2011.
|