RFCAS - The San Data Flow and Messaging Engine

home

table of contents

Comp. Sci.

February 2004

email

RFCAS – The San Data Flow and Messaging Engine

(RCFAS = Request For Comments And Suggestions)
Among other things the proposed SAN programming language is a data flow language. A San program sits on top of a data flow engine that handles buffering, message queues, etc. What I am looking for are comments and suggestions about the architecture of the engine and about issues to consider. This posting is divided into two parts. The first part is a description of the relevant features of the San language. The second part is a discussion of issues and questions.
Part I – Overview
A SAN program (execution process) is divided up into an executive and one or more agents. Each agent is a separate, independent thread of control. Unlike threads in many thread implementations, agents do not share data [1]. One can, if one likes, think of an agent as being a mini-process. Agents are event driven. An agent may (but need not) have a single backgound thread. In addition to the optional background thread it can have an indefinitely large number of event handlers called on units.
Inter-agent communication
Agents communicate with each other in one of two ways, by pipes or by messages.
Pipes
Each agent has an unboundedly large number of input and output ports [2]. Ports are connected by pipe statements that have the form:
<agent> <out-port-no>'|'<in-port-no> <agent>
where the port numbers are either integers or nil, that being a shorthand for port 0. Pipes can be concatenated on a single line. For example:
foo 1|2 bar | bagle
says that output port 1 of foo is connected to input port 2 of bar, and that output port 0 of bar is connected to input port 0 of bagle. A port can be connected to more than one pipe by using separate connection statements. For example a “tee” is effected with
foo | bar foo | bagle
(Comments on this syntax and suggestions for alternatives are solicited.)
Data is sent to pipes by “emit” commands. The emitted data can either be in the form of a character stream, a stream of words (strings), or a list of strings. In addition, blocking markers can be inserted into the stream with “mark” commands. Emit commands optionally contain a port number and a mode; the defaults are port 0 and character stream mode. For example
emit-0-char foo
sends the characters in the string bound to foo [3] to port 0 as a stream of characters. [5]
On the receiving end the arrival of data activates an on unit that accepts data from the designated input port. Here is a small example:
begin on-$0-char .count += 1 end
This particular on unit does nothing more exciting than count the number of characters received in port 0.
Messages
A significant feature of piped data is that the originating agent does not know where the data is going to, and the receiving agent does not know where the data is coming from. Messages, on the other hand, contain within them “from” and “to” fields. For example, a message might be send with code like:
msg to=foo, type=warning, text="No more boojums"
and received with an on unit in agent foo that looks like this:
begin on-msg switch $type warning: emit-1 text else: invoke-terminator end
In this bit of code code foo is an error logging agent. One other feature of message handling worth noting is the rsvp tag. When a message is tagged with the rsvp tag, the thread sending the message waits until a reply is received.
In addition to pipes and messages there are also events and exceptions; they will be left for some other discussion.
Part II – Questions and Issues
Q: Agents are analogous to processes in a *nix system; is it feasible or desirable to reuse process management code from a linux or free bsd kernel? Similarly, agents are analogous to Ada tasks. Is it feasible or desirable to reuse (or use as a model) Ada task management code?
Q: How analogous are San pipes to *nix inter-process pipes? Is it worthwhile looking at the pipe management code in *nix?
Q: Emit commands push data into a pipe. The dataflow management engine is responsible for buffering pipe data and passing it on to the target. Several things can go wrong here. The target may be dead or be non-responsive. The buffer may be full. The way I am envisioning handling this is that a failure in an emit command (which pushes data into a pipe) throws an exception. If the problem is simply that the buffer is full because the receiving agent is not yet ready to accept more data then the originating agent gets a “buffer-full” exception. It can decide to wait for the emit command to complete or it can abort the emit and continue. If the pipe is broken, i.e., if the receiving agent is dead or non-responsive, then an exception is thrown at the executive level. At that level the pipe can be redirected or other repair action taken.
The questions here are: Is this scheme sufficient or is there something serious being overlooked? Further, is there a better way than the proposed scheme?
Q: As a general rule, an exception that is not handled at the agent level throws an exception at the executive level. Exceptions not handled at the executive level are fatal. Is there a problem here that isn’t being addressed but should be?
Q: The current conception of messaging provides peer to peer messaging. Is there significant value in adding mailing lists? General broadcast? Bulletin boards? Is there an alternative that I am overlooking here?
Q: By design San is thread-safe between agents since there is no data sharing. Likewise procedures and functions are thread-safe. Assignment within on units is nominally thread safe in that all code statements are atomic at the code level. Is this enough?
Q: San does not have any inter-agent globals nor does it have globals within agents that extend across procedure/function invocations. It does distinguish between variables that are local to a on unit thread (these are not persistent) and variables that are local to the agent. The latter can be read and written to by threads. Are there boojums here that I am overlooking?
Q: Although San does not have globals as such one can emulate them by having an agent that stores the global values, and that recognizes “getval” and “setval” messages. Should there be some syntactical sugar for this?
Q: Should one have a “mailing address” system? That is, in messages, instead of having an agent name, one specifies a mailing address. One point here is that the mapping of agent to mailing address is then configurable.
Q: I haven’t quite decided on the agent creation/termination rules. I could use a direct analog of fork and kill, but they don’t seem quite right. Are there any suggested alternatives?
Q: What other questions should I be asking?
Notes

[1] They are not perceived to share data at the language level. There may be considerable data sharing “under the hood”. However this is not perceptible at the code level.
[2] There may be an implementation dependent guaranteed number of ports per agent, e.g., 1024
[3] It is best to think of variables [4] in San as being bound to values rather than being pointers to values or addresses of values.
[4] San variables are structured objects called morphs. However this is not relevant to the present discussion.
[5] San names can have hyphens in them.

This page was last updated February 12, 2004.

home

table of contents

Comp. Sci.

February 2004

email