San threading engine implementation notes
Introduction
[ top
| prev
| next
| last
] This page contains various notes about the implementation of the San threading engine. Consult it if the code seems mysterious. The remarks here are relevant for the February 24, 2009 release. The notes will be updated with the next release, and before that if new material is added. See the page on the San Engine architecture for a more general discussion of the engine architecture and concepts. Much of what appears to be overkill in data verification is necessary because the connectivity of the software can be altered dynamically during execution.
[ top
| prev
| next
| last
] All user code is embedded within agents, either as initialization code, as input port response code, or as originary data source code. When user code is activated it is passed a sigil. This is a struct that acts as a password token. User code interacts with the engine via interface functions. The first argument to an interface routine always is the sigil. A fundamental requirement is that the presented sigil be valid, i.e., the data in the sigil is legitimate. In this implementation the sigil is not encrypted but it could be in a future implementation. None-the-less the sigil isolates user code from the engine code. Instead of containing pointers to engine data a sigil contains opaque references. The user visible references are: CRFID This is a size_t integer; it actually is an index into a table of pointers to common resource frames; however the table is not visible to the user. AGENT_R This is a struct containing two size_t integers. It is a robust handle for agents. SOURCE_R This is a struct containing an agent reference and a source serial number. It points to a struct defining a source, where a source is code that autonomously generates output.[ top | prev | next | last ] Scheduling This implementation uses a simple scheduling model. Each cycle of the scheduler loop represents a unit of time called an epoch. In each epoch the emissions from the previous epoch are processed and each source is executed once. The main loop has three major sub loops. They are:
The scheduler maintains two lists, a list of items to be processed in the current cycle and a list of items to be processed in the next cycle. The lists contain instances of a struct called a delivery that looks like this: DELIVERY { AGENT_R agent_r; /* Agent receiving delivery */ PORT_NO port; /* Port receiving delivery */ size_t seqno; /* Expected inport serial number */ INPORT * inport; /* Pointer to agent inport struct */ };The agent_r field is a reference for the receiving agent; port is the input port number. Seqno is a verification sequence number. Finally, inport is a point to a structure in the agent that holds the input port. The queue of items being delivered is in a queue maintained in inport. The items in the queue are structures called packages that look like this; PACKAGE { size_t len; /* Size of package */ char * data; /* Contents of package */ EMITSTG * xpool; /* Storage pool supplying space */ };Len and data specify the serialized data being transmitted; xpool specifies the storage pool being used.
Loop 1 The emissions created are added to list crf->emissions. Here is the definition for an emission: EMISSION { PACKAGE pkg; /* Data package emitted */ DELIVERY d; /* Delivery package data */ };Pkg contains the data to be transmitted. D contains the delivery struct as it was at the time of emission; nothing is loaded into the inport queue at this time.
Loop 2
Loop 3
[ top
| prev
| next
| last
] All of the interface functions require a valid sigil as an argument. The sigil has embedded within it a reference for the agent containing the user code currently being executed. The containing agent is called the self agent. The sigil is the only argument for the following functions: san_create_child: Create a child agent san_create_sibling: Create a sibling agent san_get_agent_data: Gets user visible agent dataThere are two functions that have a data array index argument; they return an agent datum by index from an array. Each returns a reference; if the index is out of range they return a null reference. The functions are: san_get_child_no: Gets child ref for a given index san_get_source_no: Gets source ref for a given indexThere are two deletion functions. Each has a argument specifying the entity (source or agent) to be deleted. The specified agent (or containing agent if a source is being deleted) must either be the agent itself or a child agent. The functions are: san_delete_agent: Deletes an agent san_delete_source: Deletes a source elementThre are three load functions that associate a user supplied function with an agent. The arguments for each is the sigil, the agent being supplied with a user function, and the user function. Again the specified agent must either be the agent itself or a child. The supplied function pointer cannot be a null function pointer. The san_load_init function also requires that the agent not be initialized. The functions are: san_load_init: Loads agent initialization codee san_load_inport: Loads an execution function into an inport san_load_source: Adds a source element to an agentThere are two connection management functions, one to create a connection. Each has source and destination agents as arguments along with the output and input port numbers. The source and destination agents must satisfy one of the following cases:
san_connect: Creates a data flow connection san_disconnect: Eliminates a data flow connectionThe san_emit command emits data out an output port. The port must be a valid port; i.e., it must be part of a data flow connection. The san_rpt_agent writes an agent report to the report file; it must be passed a valid agent. These functions are: san_emit: Emits a data packet san_rpt_agent: Creates an agent report
[ top
| prev
| next
| last
] Each agent has a list of input port structures called inports. An inport looks like this: INPORT { PORT_NO port; /* Input port number */ LISTHDR queue; /* List of queued input data */ void * globals; /* Ptr to agent globals */ size_t n_feeds; /* No. of connections to port */ inport_exec_td exec; /* Function to execute */ };Conceptually connection tables belong to the parent agent; as a matter of convenience they are stored with the emitting agents. Entries in the connection tables look like this: CONNEX { PORT_NO outport; PORT_NO port; AGENT_R agent_ref; INPORT * dest; };When a connection is made an inport is created for the receiving agent and port if none already exists. When an emitter generates an emission it scans the emitter’s connection table to find all entries for that output port. It checks the agent reference to make sure that the agent still exists. If so, it appends the emission to the inport queue. It also appends the packet address to a central emissions list that is used by the scheduler. The n_feeds fields in the INPORT struct holds the number of connections to the agent input port. The field is incremented each time a connection is made to the port and decremented each time a connection to the port is disconnected. When the count reaches zero during a disconnect the inport struct is move from the agent inport list to resource frame inport free list. The disconnect routine also removes the connection struct (CONNEX) from the CONNEX table in the emitting agent. When an agent is deleted its list of inports are added to the resource frame inport free list. The corresponding entries in the CONNEX table are not deleted at that time. They are deleted later when an attempt is made to send data to the deleted agent.
[ top
| prev
| next
| last
] The San engine uses a copy data protocol for inter agent data transfer rather than a data reference protocol. In other words when an agent emits data the emit code makes a copy of the data for each port to which the data will be delivered. Transfer by regence is faster than transfer by copy, but it is dangerous unless it can be guaranteed that (a) the emitter will not alter the data after sending it, and (b) the receiver(s) do not alter the data after receiving it. The code uses a double storage pool technique so as to avoid excessive storage allocation and deallocation. The pools hold queued data that has not yet been processed. There are two pools, old and current. No space is taken from the old pool; all space for emitted items is taken from the current pool. When space taken from a pool is released the count of items in the pool is decremented. When a pool count reaches 0 the pool is cleared. At the end of a scheduling cycle the pool count for the old pool is checked. If it is 0 the old is cleared, the current pool becomes the old pool, and a new current pool is created.
[ top
| prev
| next
| last
] A source (in the sense of sources and sinks) is an autonomous generator of data streams. An agent can host several sources. A source is represented by a SOURCE struct defined as follows: SOURCE { SIGIL sigil; /* Sigil for the host agent */ source_exec_td exec; /* Code to be executed by the source */ size_t index; /* Index in the long list array */ size_t srcno; /* Id # relative to the host agent */ SOURCE_R s_ref; /* Reference used in user code */ SOURCE * link; /* Link used in linked lists */ };Each agent that hosts sources has a linked list of SOURCE structs corresponding to the hosted sources. These lists are the only locations where SOURCE structs are stored. In addition to the source structs themselves there is an array of pointers to the entire suite of source structs. This array is used internally by the scheduler to schedule source activation. The source structs are not visible to user code. User code accesses sources via SOURCE_R references. The array of struct pointers is not user accessible. When a new source is added to an agent the storage for the struct is taken from a free list. Each agent maintains a counter that supplies a unique id for each source struct. The new source struct is linked into the existing agent source list and a pointer to the struct is added to the end of the pointer array. When a source is deleted from an agent things are slightly more complicated. The agent’s list of sources is scanned and the source is snipped from the list. (The assumption is that agent source lists will be small.) The deleted source is added to the free list of sources. The list of pointers is divided into two segments, processed and unprocessed when the list is being scanned. If the deleted source is unprocessed the last pointer in the array replaces the deleted pointer and the index in the struct is changed to point to the new location. If the deleted source is processed two replacements are done; the last processed replaces the deleted pointer, and the last of the array replaces the last processed. The index fields in the structs pointed to by the moved pointers is updated to reflect the new location.
This page was last updated February 24, 2009. |