Richard Harter’s World
Site map
Mathcomp
San Language
April 2011
Email

Notes on data flow engine design

This article is about data flow engines design issues. A dataflow engine is the software that handles scheduling the transmission of messages (information packets) between autons (components), maintenance of the data structures holding messages, and the API between the component software and the data flow engine.

I take the view that dataflow engine design naturally separates into two distinct components, inter-thread and intra-thread communication, each with their own implementation issues. The focus in this article is on scheduling issues for engines that run within a single thread of control.

From the perspective of the scheduler autons (components) are a set of inports (input ports) and outports (output ports). Outports emit packages. A package has a wrapping and a packet. The wrapping contains the package address, an (inport,auton) pair, and sundry metadata. The packet contains metadata and either an document (FBP – an information packet) or a data value.

Packet contents are passed to input response code associated with the inport. The input response code processes the packet. In turn it can emit new packages to be delivered at a later date by the scheduler. Once packet processing is complete control returns to the scheduler. In turn the scheduler selects the next package to be delivered.

Definitions

Terminology is a problems in discussing dataflow/flow based programming. Here is a little section on the terminology that I use.
Auton: Short for autonomous computing element.
Inport: An input port associated with an auton.
Outport: An output port associated with an auton.
Pipe: A connector between an inport and an outport. Data flows from an inport thru the pipe and into an outport.
Network: A collection of autons connected by pipes.
Nest: Auton network managed as a single entity by a scheduler.
Packet: Data element that an auton can receive. There are two types of packets – documents and data.
Document: A data object that persists as a singleton object throughout its duration in the network.
Package: An <address,packet> pair. Autons emit packets; the outport wraps the packet in a package; pipes transport packets; the packet is extracted at the inport.
Address: An <auton id, inport id> pair specifying the address of a destination inport.

Scheduling constraints

“Flow Based Programming” mentions three general constraints. To these I have added two more. They are:

(FBP) The flow constraint: Basically this says a package cannot be delivered until it has been sent. This may seem obvious but it implies that data flows follow a flow diagram.

(FBP) The one place at a time constraint: In FBP an information packet can only be in one place at a time – either in a queue, within another IP, or within an auton. I distinguish between documents and data; Documents are objects that can only be in one place at a time. Data are not objects; they are only values.

(FBP) The consistent ordering constraint: The general statement is that objects must be processed in the same order that they are sent. The ordering constraint implies that packages must be stored in queues before they are delivered. There are three variants of this constraint, inport order, auton order, and nest order.

Inport order: The inport queue (or its image in a larger queue) preserves the order in which the packages were sent.

Auton order: Packages sent to an auton are processed in the order they were sent. Auton order is usually associated with an auton queue.

Nest order: A master queue holds all undelivered packages in the order that they were sent.

(RH) The delivery must be attempted constraint: An attempt must be made to deliver every package; either it must ultimately be delivered or rejected as undeliverable.

(RH) Execution is divided into epochs. Packages emitted within one epoch are not delivered until the next epoch. This is the default; it can be over ridden.

Input processing styles

When a delivery is made there are three major possible responses. The first is that one package is delivered. The second is that all packages in the inport’s queue of packages are processed. The third is that all packages in the auton’s inports are processed. Which of these strategies is used depends on the auton code structure.

  1. Single package processing: Morrison calls this type of processing no-loopers. The auton internal code is divided into inport response functions, one for each inport. The inport response function receives a packet as an argument. The internal response code has an equivalent of this form (pick your own keywords)
        From <inport id> Receive <packet>
            <body of response>
            End
    
  2. Inport queue processing: In this style the inport has an associated queue. As above each inport has its own response function. However in this style the response function has a master loop which extracts and processes packets from the queue. Morrison calls this type of processing loopers. The inport response code has an equivalent of this form:
        While (More <inport id>)
            Receive <packet>
            <process packet>
            End while
    
  3. Process all queued input: In this style the auton has a single inport. There is a single input queue that holds all packets received by the auton. When the auton is activated the auton response function has a master loop that extracts packets from the input queue. Within the loop body there is a switch on the packet type. Functional languages such as Erlang tend to use this style. The auton response code has an equivalent of this form:
        While (More input)
            Receive <packet>
            Switch <packet type>
                Case <type 1>
                    <code to process type 1>
                ...
                End switch
            End while
    
The processing style does not have to be implemented in the input response code. Instead the scheduler can implement a chosen style by controlling the order of packages being delivered.

Relationship of styles and constraints

We cannot arbitrarily mix style type and order constraint type. The nest-order constraint requires that single-package processing be used. The auton-order constraint requires that either single-package processing or auton-order processing be used. Finally, the inport-order constraint is consistent with all three processing styles. In other words, the inport-order constraint is the weakest constraint, auton-order is intermediate, and nest-order is the strongest constraint.

Establishing a transition boundary

In simulations it is useful to establish a “clock”. The simulation has a sequence of state and input pairs, <S,I> that represent the system state and its inputs at the beginning of a clock tick. In the interval between one tick and the next the system generates a set of outputs in response to its inputs. In turn the outputs become the inputs for the next tick. Symbolically

    <S0,I> => M => <S1,M>
Regardless of the constraint choice or the processing style we can always maintain a transition boundary by having a separate output queue. All packages generated during an interval are attached to the output queue. When the transition boundary is reached the packages from the output queue become the new input queue.

An output queue is necessary if inport or auton queue processing is used. It is not necessary if nest-order is used; however there is no advantage in not having an output queue.

Having a transition boundary is a constraint. It says that packages emitted during one clock tick are not delivered until the next clock tick.

Specifying the order of execution

Order constraints dictate using queues; however inport-order and auton-order constraints do not fully specify the order of execution. Here are two alternatives.

  1. Random selection scheduling: The idea here is to maintain a list of all autons/inports that have queues with content. Pick one at random and process the queue. This potentially violates the “delivery must be attempted” constraint; however there are simple ways to ensure that every entry in the list is eventually selected.
  2. Original order scheduling: The idea here is to use a master list of packages ordered by when they were sent. Processed packages are deleted from the list.

Using a master queue, advantages and disadvantages

Many implementations maintain a separate queue for each auton/inport. Using a master queue arguably is both more efficient in storage use and arguably provides better system integrity.

The reason is quite simple. When separate queues are used space must be allocated for the maximum number of packages expected to be in each queue. When a master queue is used we need only allocate space for the maximum number of packages in the system.

Obviously using a master queue will require less storage, though whether the savings are significant depends both on the implementation and the application. Configuring applications is simpler when just the master queue must be sized than when each separate queue has to be sized.

Using a single master queue simplifies design because the need for “suspend on send” blocking is obviated. The root cause for flooded queues is processes that produce packets at a rate greater than the rate that the rest of the system can handle. The cure is to bound the output rate of said processes.

Stack based scheduling

In stack based scheduling the scheduler pushes emitted autons onto a stack. When it selects the next package to be delivered it pops one off the stack. Stack based scheduling is efficient in its use of space; however it can violate order constraints.

For example, suppose auton X sends packages to Y and Z in that order. The scheduler delivers a package to Z which in turn sends packages to W and Y. The stack now contains a package for Y, one for W, and a second, later one for Y. The scheduler will now deliver the later one before the earlier one.

Another problem with stack based scheduling is that the application can get trapped in an infinite loop cycling through part of the network.

None-the-less there are situations in which restricted stack based scheduling can be useful.

One such is “follow the packet scheduling”. This can be used when all packets are documents (FBP: Information packets). The idea is that when an IP enters an auton it only has one path out; therefore it can be followed until it is discarded or otherwise exits the system. This strategy requires keeping a list of all documents in the system.

A second situation is recursion. Stack based scheduling is appropriate when recursion is effected by spawning subsidiary autons.

Deferred delivery

In simulations we may want to defer delivery of a package until some later time, either in simulation pseudo-time or in real time. A common way to handle deferred delivery is to build a deferred delivery heap into the scheduler.

An alternative is to use a auton that receives a package that contains a message and a send time. The auton responds by doing a time check on each clock tick. When the send time it reached it packages the message and sends it on.

Blocked ports

When each auton or each inport has its own separate queue it is fairly easy to create blockages by filling queues. Packages cannot be sent to the filled queues. If the sending auton does not take action to handle the failure to send it becomes blocked with a “suspend on send”. The auton remains suspended until the blocked queue is opened up. This condition can propagate upstream. There is a fair amount of discussion in “Flow Based Programming” about dealing with this condition.

There also is “suspend on receive”. There are two cases to consider. One is normal; it occurs when the input queue is empty. The other is algorithmic; the code within the auton places a block on an inport that is not lifted until some condition is met.

The situation is quite different when there is a single master queue. There are no “suspend on send” blockages. It is possible for processes to emit excessive packages and thereby flood the master queue. This is easily detected and it is quite straightforward to fix.

Normally the only blockages that are seen in the single master queue scenario are blocks placed on an inport by the auton for algorithmic reasons. For example, suppose an auton is interleaving inputs from ports A and B. When it reads data from port A we block it so that the next input must come from port B.

When an inport is blocked it is convenient for its queue to be separate from the master queue.


This page was last updated April 1, 2011.

Richard Harter’s World
Site map
Mathcomp
San
April 2011
Email