San threading engine implementation notes

Introduction
Sigils and References
Scheduling
Constraints on interface functions
Handling of inports
Inter agent data transfer – storage
Tracking source structs

[ top | prev | next | last ]
Introduction

This page contains various notes about the implementation of the San threading engine. Consult it if the code seems mysterious. The remarks here are relevant for the February 24, 2009 release. The notes will be updated with the next release, and before that if new material is added. See the page on the San Engine architecture for a more general discussion of the engine architecture and concepts.

Much of what appears to be overkill in data verification is necessary because the connectivity of the software can be altered dynamically during execution.

[ top | prev | next | last ]
Sigils and References

All user code is embedded within agents, either as initialization code, as input port response code, or as originary data source code. When user code is activated it is passed a sigil. This is a struct that acts as a password token.

User code interacts with the engine via interface functions. The first argument to an interface routine always is the sigil. A fundamental requirement is that the presented sigil be valid, i.e., the data in the sigil is legitimate.

In this implementation the sigil is not encrypted but it could be in a future implementation. None-the-less the sigil isolates user code from the engine code. Instead of containing pointers to engine data a sigil contains opaque references. The user visible references are:

CRFID            This is a size_t integer; it actually is an index into a
                 table of pointers to common resource frames; however the
                 table is not visible to the user.
AGENT_R          This is a struct containing two size_t integers.  It is a
                 robust handle for agents.
SOURCE_R         This is a struct containing an agent reference and a source
                 serial number.  It points to a struct defining a source, where
                 a source is code that autonomously generates output.

[ top | prev | next | last ]
Scheduling

This implementation uses a simple scheduling model. Each cycle of the scheduler loop represents a unit of time called an epoch. In each epoch the emissions from the previous epoch are processed and each source is executed once. The main loop has three major sub loops. They are:

A loop to process transmissions made during the previous epoch.
A loop to generate emissions from sources.
A loop to convert emissions into deliverables.

The scheduler maintains two lists, a list of items to be processed in the current cycle and a list of items to be processed in the next cycle. The lists contain instances of a struct called a delivery that looks like this:

DELIVERY {
    AGENT_R           agent_r;          /* Agent receiving delivery        */
    PORT_NO           port;             /* Port receiving delivery         */
    size_t            seqno;            /* Expected inport serial number   */
    INPORT          * inport;           /* Pointer to agent inport struct  */
};

The agent_r field is a reference for the receiving agent; port is the input port number. Seqno is a verification sequence number. Finally, inport is a point to a structure in the agent that holds the input port. The queue of items being delivered is in a queue maintained in inport. The items in the queue are structures called packages that look like this;

PACKAGE {
    size_t            len;              /* Size of package                 */
    char            * data;             /* Contents of package             */
    EMITSTG         * xpool;            /* Storage pool supplying space    */
};

Len and data specify the serialized data being transmitted; xpool specifies the storage pool being used.

Loop 1
Loop 1 loops over the current deliveries list. There is one delivery per input port receiving input; however the delivery can deliver more than one package to the port. If the delivery is verified the current version of the exec_import function processes all of the packages in the queue in the agent’s input port. That is, if port 1 has three items in its inport queue all three items will be processed with three successive calls to the inport response function. However the code provides for the possibility of not handling all items in the queue.

The emissions created are added to list crf->emissions. Here is the definition for an emission:

EMISSION {
    PACKAGE           pkg;              /* Data package emitted            */
    DELIVERY          d;                /* Delivery package data           */
};

Pkg contains the data to be transmitted. D contains the delivery struct as it was at the time of emission; nothing is loaded into the inport queue at this time.

Loop 2
Loop 2 loops through the sources in crf->source_list. See the section on source tracking for details about management of sources. The outputs of the user defined sources is added to crf->emissions.

Loop 3
Loop 3 is encapsulated in update_deliveries. It checks that the delivery component of each emission is valid, i.e., that it is a current delivery specification for that inport. If all is okay, the package is moved to the deliveries inport queue.

[ top | prev | next | last ]
Constraints on interface functions

All of the interface functions require a valid sigil as an argument. The sigil has embedded within it a reference for the agent containing the user code currently being executed. The containing agent is called the self agent.

The sigil is the only argument for the following functions:

san_create_child:     Create a child agent
san_create_sibling:   Create a sibling agent
san_get_agent_data:   Gets user visible agent data

There are two functions that have a data array index argument; they return an agent datum by index from an array. Each returns a reference; if the index is out of range they return a null reference. The functions are:

san_get_child_no:     Gets child ref for a given index  
san_get_source_no:    Gets source ref for a given index

There are two deletion functions. Each has a argument specifying the entity (source or agent) to be deleted. The specified agent (or containing agent if a source is being deleted) must either be the agent itself or a child agent. The functions are:

san_delete_agent:     Deletes an agent        
san_delete_source:    Deletes a source element

Thre are three load functions that associate a user supplied function with an agent. The arguments for each is the sigil, the agent being supplied with a user function, and the user function. Again the specified agent must either be the agent itself or a child. The supplied function pointer cannot be a null function pointer. The san_load_init function also requires that the agent not be initialized. The functions are:

san_load_init:        Loads agent initialization codee          
san_load_inport:      Loads an execution function into an inport
san_load_source:      Adds a source element to an agent

There are two connection management functions, one to create a connection. Each has source and destination agents as arguments along with the output and input port numbers. The source and destination agents must satisfy one of the following cases:

The source and destination agents are both children of the self agent.
The two agents are the self agent and a sibling of the self agent.
The two agents are the self agent and a child of the self agent.
Both agents are the self agent.

When a connection is being made it can’t already exist; similarly when a connection is being deleted it must exist. The functions are:

san_connect:          Creates a data flow connection    
san_disconnect:       Eliminates a data flow connection

The san_emit command emits data out an output port. The port must be a valid port; i.e., it must be part of a data flow connection. The san_rpt_agent writes an agent report to the report file; it must be passed a valid agent. These functions are:

san_emit:             Emits a data packet     
san_rpt_agent:        Creates an agent report

[ top | prev | next | last ]
Handling of inports

Each agent has a list of input port structures called inports. An inport looks like this:

INPORT {
    PORT_NO           port;              /* Input port number              */
    LISTHDR           queue;             /* List of queued input data      */
    void            * globals;           /* Ptr to agent globals           */
    size_t            n_feeds;           /* No. of connections to port     */
    inport_exec_td    exec;              /* Function to execute            */
};

Conceptually connection tables belong to the parent agent; as a matter of convenience they are stored with the emitting agents. Entries in the connection tables look like this:

CONNEX {
    PORT_NO           outport;
    PORT_NO           port;
    AGENT_R           agent_ref;
    INPORT          * dest;
};

When a connection is made an inport is created for the receiving agent and port if none already exists. When an emitter generates an emission it scans the emitter’s connection table to find all entries for that output port. It checks the agent reference to make sure that the agent still exists. If so, it appends the emission to the inport queue. It also appends the packet address to a central emissions list that is used by the scheduler.

The n_feeds fields in the INPORT struct holds the number of connections to the agent input port. The field is incremented each time a connection is made to the port and decremented each time a connection to the port is disconnected.

When the count reaches zero during a disconnect the inport struct is move from the agent inport list to resource frame inport free list. The disconnect routine also removes the connection struct (CONNEX) from the CONNEX table in the emitting agent.

When an agent is deleted its list of inports are added to the resource frame inport free list. The corresponding entries in the CONNEX table are not deleted at that time. They are deleted later when an attempt is made to send data to the deleted agent.

[ top | prev | next | last ]
Inter agent data transfer – storage

The San engine uses a copy data protocol for inter agent data transfer rather than a data reference protocol. In other words when an agent emits data the emit code makes a copy of the data for each port to which the data will be delivered. Transfer by regence is faster than transfer by copy, but it is dangerous unless it can be guaranteed that (a) the emitter will not alter the data after sending it, and (b) the receiver(s) do not alter the data after receiving it.

The code uses a double storage pool technique so as to avoid excessive storage allocation and deallocation. The pools hold queued data that has not yet been processed. There are two pools, old and current. No space is taken from the old pool; all space for emitted items is taken from the current pool. When space taken from a pool is released the count of items in the pool is decremented. When a pool count reaches 0 the pool is cleared. At the end of a scheduling cycle the pool count for the old pool is checked. If it is 0 the old is cleared, the current pool becomes the old pool, and a new current pool is created.

[ top | prev | next | last ]
Tracking source structs

A source (in the sense of sources and sinks) is an autonomous generator of data streams. An agent can host several sources. A source is represented by a SOURCE struct defined as follows:

    SOURCE {
        SIGIL            sigil;   /* Sigil for the host agent          */
        source_exec_td   exec;    /* Code to be executed by the source */
        size_t           index;   /* Index in the long list array      */
        size_t           srcno;   /* Id # relative to the host agent   */
        SOURCE_R         s_ref;   /* Reference used in user code       */
        SOURCE         * link;    /* Link used in linked lists         */ 
    };

Each agent that hosts sources has a linked list of SOURCE structs corresponding to the hosted sources. These lists are the only locations where SOURCE structs are stored.

In addition to the source structs themselves there is an array of pointers to the entire suite of source structs. This array is used internally by the scheduler to schedule source activation.

The source structs are not visible to user code. User code accesses sources via SOURCE_R references. The array of struct pointers is not user accessible.

When a new source is added to an agent the storage for the struct is taken from a free list. Each agent maintains a counter that supplies a unique id for each source struct. The new source struct is linked into the existing agent source list and a pointer to the struct is added to the end of the pointer array.

When a source is deleted from an agent things are slightly more complicated. The agent’s list of sources is scanned and the source is snipped from the list. (The assumption is that agent source lists will be small.) The deleted source is added to the free list of sources. The list of pointers is divided into two segments, processed and unprocessed when the list is being scanned. If the deleted source is unprocessed the last pointer in the array replaces the deleted pointer and the index in the struct is changed to point to the new location. If the deleted source is processed two replacements are done; the last processed replaces the deleted pointer, and the last of the array replaces the last processed. The index fields in the structs pointed to by the moved pointers is updated to reflect the new location.

Richard Harter’s World

Site map

February 2009

Comp Sci

San Project

email