Richard Harter's World
Home page for San
Comp. Sci.
email

The San Programming Language

__________________________________
1. Introduction
2. San File Structure
3. Character set and lexemes
4. Basic Syntax
5. Data Types
6. San objects
7. Code segment structure
8. Expressions
9. Assignment
10. Flow Control
11. Procedures and Functions (procs)
12. Agents
13. Sequents
14. Mathematical operations
15. Exception handling
16. Configuration Segments
Appendix I - Indentation choices
Appendix II - Keywords
Appendix III - Sequent operators
Detailed table of contents
Revision history
This document contains a preliminary specification for the San programming language. It supercedes all language descriptions prior to October, 2008. The specification is informal; it doesn't contain a formal grammar for the language, although there are grammar fragments present. However the specification should suffice for implementing an interpreter for the language.

The San language is named after the San people of Africa, who have one of the most distinct language groups in the world. The suggestion for using the name, San, is due to Suford Lewis, for whom many thanks.

The October 1, 2008 revision is a major revision. Major changes include reordering and renumbering the sections, changing the syntax of keywords, and rewriting the flow control section. In addition there are numerous minor revisions. See the revision document for details.

1. Introduction

San falls into the general category of Command Line Interpreter (CLI) languages aka scripting languages. Popular languages in this category include PERL, Python, Ruby, TCL, REXX, and the UNIX shell languages. CLI languages typically provide builtin functionality for implementing OS administrative tasks, and for manipulating the contents of text files. Modern CLI languages commonly provide a diverse variety of canned functionality based upon builtin domains of expertise.

It is generally the case that CLI language programs are smaller and take less time to create than the equivalent 3GL programs. On the other hand CLI programs typically run more slowly than equivalent 3GL program, CLI programs being either interpreted or byte compiled.

Since CLI languages occupy the same niche, one might expect that there is room for only one or two such languages. This is not the case. CLI languages are quite diverse in style and approach to programming technique, and in the domains of expertise that are supported. Major features of the San language include:


Return to the top of the page.

2. San File Structure

2.1 Style statements
2.2 Segments
2.3 Comment lines and blank lines

San files are divided into an optional prolog, and optional sections called segments. The prolog consists of white space, comments, and style statements.

2.1 Style statements

San source files begin with an optional suite of style statements that specify the coding style choices used in the file. Style choices include: Here is an example of a style section:
        Language:    English
        Indentation: tab
In this particular example the keywords will be in English and tabs will be used for indentation. All style choices are translatable into each other by simple replacement.

Back to | 2.1 Style statements | 2. San File Structure | Top of Page ]

2.2 Segments

The remainder of a San source file consists of major sections called segments. There are three types of segments, program segments, configuration segments, and code segments. They must appear in that order, i.e., the executive segments appear first, then the configuration segments, and finally the code segments.

A segment consists of a segment initiation statement followed by an indented body. Each segment statement has two items, the segment type, and the segment label. Here is an example of a segment:

Program: testsort
    Entry: sortstuff#testsort
    Configuration: rev17
    End segment

Code segments contain the actual code. In addition to executable code within a code segment, there may also be dictionary statements that specify which dictionaries to use to satisfy external references..

The program segement specifies which agent will be used as main program and which configuration segment will be used to specify the configuration. The agent name can be a symbolic name; the selected configuration segment translates the symbolic name into an actual file system path, code segment, and named agent. Symbolic names can be used within configuration segment, and a configuration segment can reference another configuration segment.

Back to | 2.2 Segments | 2. San File Structure | Top of Page ]

2.3 Comment lines and blank lines

Comment lines are lines starting with an octothorp "#". The initial comment character can be preceded by white space characters (space and tab.) Blank lines are lines that are all white space characters.

Comment lines and blank lines can be placed between segments.

Back to | 2.3 Comment lines and blank lines | 2. San File Structure | Top of Page ]


Return to the top of the page.

3. Character set and lexemes

3.1 The San character set
3.2 Special characters and their uses
3.3 Identifiers

3.1 The San character set

The San character set consists of: In other words the character set consists of the keys on a standard keyboard.

Back to | The San character set | Character set and lexemes | Top of Page ]

3.2 Special characters and their uses

All of the special characters have special meanings except the backquote character and the backslash character. The underscore is only used in identifiers. The use of each special character is given in the following table:

Category Char Description
Paired delimiters {} Braces are used to delimit the lexical character substitution operator.
Paired delimiters [] Brackets are used in two distinct ways.
One way is to encapsulate array indices.
The other is to delimit sequents (iteration generators).
Paired delimiters <> Angle brackets are used to delimit function invocations.
In San the function name is the first thing inside the angle brackets.
Paired delimiters () Parentheses are used for grouping.
They force the order of interpretation.
Paired delimiters " Paired double quotes are one of the two ways to delimit strings within a line,
the other being count format strings.
Boolean operator & The ampersand is used as the infix operator for the logical and.
Boolean operator
Pipe
Text prefix
| The vertical bar has three uses, as the infix operator for the logical or,
as a pipe, and in text blocks as a prefix character.
Boolean operator ~ The tilde is used as the prefix operator for logical negation.
Arithmetic operator + The + character is the infix addition operator.
It can also appear in numeric literals.
Arithmetic operator - The - character is the infix subtraction operator.
It can also appear in numeric literals,
and in array index expressions.
Arithmetic operator * The * character is the infix multiplication operator.
Arithmetic operator / The / character is the infix division operator.
It can also be used in file path names.
Arithmetic operator ^ The ^ character is the infix exponentiation operator.
Comment character # The octothorp (pound sign) is used to start a comment in a line.
It is also used within path names to refer to file components.
Identifier separator . The period is used to separate identifier.
It is also used within numeric literals.
Identifier separator ' The single quote separates object aspects from the object identifier.
Assignment = The equals sign, =, is used for assignment.
Assignment
Text blocks
Case statements
Range delimiters
: The colon, :, is used in declarative assignments, as a prefix
in text blocks, in case statements, and as range delimiters.
Lexical separator , The comma is used to separate items in a list, typically an argument list.
Lexical separator ; Semicolons separate multiple statements on a line.
Infix conversion ! Appending ! to a function identifier makes it an infix operator.
Infix conversion ? Appending ? to a function identifier makes it a
boolean valued infix operator.
Pronoun, alias @ A naked @ sign is a pronoun; an identifier prefixed
with an @ sign is an alias.
Edit commands % The percent character, %, is used as a prefix character
for text editing commands.
Formal arguments $ The currency character, $, is used as a prefix character
for formal arguments, e.g., $1,$2, $3 etc.


Back to [ 3.2 Special characters and their uses | 3. Character set and lexemes | Top of Page ]

3.3 Identifiers

3.3.1 Simple identifiers
3.3.2 Arrays
3.3.3 Qualified identifiers
3.3.4 Local identifiers
3.3.5 Vector of fields
3.3.6 Special identifiers

3.3.1 Simple identifiers

Simple identifiers are identifiers that can be defined in code and bound to objects. A simple identifier consists of an initial lower case letter followed by a sequence of characters from the set {'a'-'z','0'-'9','_'}. The final character may not be an underscore. For example foo_bar09 is a simple identifier. San is a case sensitive language. Simple identifiers are lower case only; San uses case to distinguish between simple identifiers and special identifiers such as symbols and keywords.

Back to [ 3.3.1 User identifiers | 3.3 Identifers | 3. Character set and lexemes | Top of Page ]

3.3.2 Arrays

San objects have an infinite number of dimensions with each index having a range from minus infinity to plus infinity. The infinite array has a finite kernel consisting of a finite number of dimensions with each dimension having a finite range. All cells outside the kernel have default values. The syntax for an array expression is:

        <Simple identifier>[<indexing expression>]
An indexing expression describes a slice of the array. There are two types of indexing expressions, definite and an indefinite.

Definite indexing expressions consist of comma separated index list specs, with one index list spec per dimension. For example foo[2,3] uses a definite indexing expression. An index list spec can be any of the following:

Here is an example:
        b[1:3] = 0    # b is a 1 dimensional array, indexed 1 to 3,
                      # all cells set to 0
        a[0,] = b     # a is a 2 dimensional array, with the first
                      # dimension indexed 0 to 0, and the second
                      # indexed 1 to 3 (inherited from b) with all
                      # cells set to 0, copied from B.
        c[2] = a[0,]  # c is a 1 dimensional array, indexed 2 to 2,
                      # its single cell set to 0
        c[3] = 1      # c now is indexed 2 to 3, with cell 2 set to 0
                      # and cell 3 set to 1
Array dimensions are not declared in declarations. Instead they are established dynamically in usage. In the example the statement that cells are set to some value is a simplification. Each cell is an object in its own right. For example, c[2] and c[3] are distinct objects.

There are four versions of the indefinite indexing expression. They are:

  1. <simple identifier>[]
  2. <simple identifier>[<dimension spec>,*]
  3. <simple identifier>[*,<dimension spec>]
  4. <simple identifier>[<dimension spec>,*,<dimension spec>]
The first form represents the entire array, regardless of the number of dimensions. The second form represents a (dimension preserving) slice of the array; the <dimension spec> specifies the restriction of the initial indices with the remaining indices being unaltered. The third form is the same as the second form except that the final indices is restricted instead of the first. Finally, in the fourth form the initial and final indices are specified, with the the remaining central indices being unaltered. Here is an example:
        a[1:2,1:3,1:5,1:3,1:2] = 0
        b[] = a[ ,-,*,2]
The dimensions of the array b will be [1:2,1:5,1:3,2:2]. b inherits the first dimension of a, 1:2, the second is deleted, the third and fourth are inherited, and the fifth is restricted to 2:2.

Back to [ 3.3.2 Arrays | 3.3 Identifers | 3. Character set and lexemes | Top of Page ]

3.3.3 Qualified identifiers

An identifier can be extended with a qualifier. The syntax is:
        <identifier><qualifier mark><extension>
There are two types of extensions, each with its own qualifier mark. The two qualifier marks are the apostrophe and the period; the two types of extensions are value extensions and object extensions. For example:
        foo'fun         # is a value extension
        foo.fun         # is an object extension
Value extensions are terminal; they cannot be extended. Object extensions can be further extended. They can also be combined with arrays. For example:
        foo[].bar[3,5].x'y
This identifier specifies an array of values named y. The array consists of the components of foo specified by bar[3,5].x.

Back to [ 3.3.3 Qualified identifiers | 3.3 Identifers | 3. Character set and lexemes | Top of Page ]

3.3.4 Local identifiers

Local identifiers reference fields and aspects of the containing object. The keyword 'Self' is used to refer the containing object. Example:
    Proc alpha
        Init Self = 0
        Self = Self + 1
        End
Each time alpha is called its current value is incremented.

Back to [ 3.3.4 Local identifiers | 3.3 Identifers | 3. Character set and lexemes | Top of Page ]

3.3.5 Vector of fields

Qualifier marks can also introduce vectors of fields. The syntax is:
        <identifier><qualifier mark>[<index list spec>]

If the qualifier mark is an apostrophe the expression denotes a list of value fields; if it is a period it denotes a list of object fields. Example:
        foo[] = bar.[]
Back to [ 3.3.5 Vector of fields | 3.3 Identifers | 3. Character set and lexemes | Top of Page ]

3.3.6 Special identifiers

There are three types of special identifiers, keywords, symbols, and pronouns.

Keywords are identifers that are reserved words in the language. The first character of a keyword is upper case letter; the remaining characters are a mixture of lower case letters, digits, and underscores. If there is more than one character in a keyword it is guaranteed to contain at least one lower case letter.

Symbols are atomic values. All letters in a symbol must be upper case. The first character must be an upper case letter; the remaining characters are from the set A-Z,0-9, and _. A symbol must contain at least two characters. The last character may not be an underscore.

Pronouns are simple identifiers prefixed with the "at" (@) character. They are variables that contain identifier component names.

Back to [ 3.3.6 Special identifiers | 3.3 Identifiers | 3. Character set and lexemes | Top of Page ]


Return to the top of the page.

4. Basic syntax

4.1 Block structure
4.2 Indentation
4.3 Comments
4.4 Boolean literals
4.5 Numeric literals
4.6 The string substitution operator
4.7 Character string literals
4.8 Text blocks
4.9 Multi-line statements

The three types of segments, executive, configuration, and code, all share a common basic syntax. The syntax for the executive and configuration segments is a simplified specialization of the syntax used in the code segments. This section of the specification describes the common syntax.

San is an "end of line" (EOL) sensitive language. Each line of text within a San source file is a syntactical unit, typically a statement.

4.1 Block structure

San is a block structured language. Each block consists of a block inititator, a block body, and a block terminator.

The block initiator is a single line. The general rule is that the first token in the block initiator line is a keyword that specifies the type of block. The remainder of the line consists of zero or more qualifier fields.

The block body consists of single line statements and sub-blocks.

The block terminator also is a single line beginning with the word "End". The "End" token can stand alone or it can be followed by either the block type or the block label.

Back to [ 4.1 Block structure | 4. Basic syntax | Top of Page ]

4.2 Indentation

All blocks in a San file must use the same consistent indentation style. The default indentation style is for each line to be indented the same fixed number of spaces (the number being determined by the first line in the block body) and for the block terminator to be indented to the same level as the block body. Here is an example of a block of code with a sub-block.

   Foreach x In [1:10]
      If car[x].color ceq? "red"
          Print car {x} is red
          End if
      End foreach

Back to [ 4.2 Indentation | 4. Basic syntax | Top of Page ]

4.3 Comments

San uses # as a comment initiator. Comment lines are lines that start with a comment initiator at the current indentation point. (Comment lines must respect indentation.) Within a line a comment initiator (provided it is not within a character string) terminates the line. The comment initiator and remainder of the line is discarded.

There is one exception to the use of # as a comment initiator. In pathnames within a configuration segment it specifies a component within a file.

Back to [ 4.3 Comments | 4. Basic syntax | Top of Page ]

4.4 Boolean literals

San recognizes two predefined boolean variables, True and False.

Back to [ 4.4 Boolean literals | 4. Basic syntax | Top of Page ]

4.5 Numeric literals

The format for numberic literals is <sign><fixed-point-part><exponent>, where the sign and exponent are optional. The <fixed-point-part> can be any of

<integer>
<integer>.
.<integer>
<integer>.<integer>
The sign can either be the plus '+' character or the minus '-' character. If there is no sign the number is non-negative (zero or positive.) The exponent consists of the exponent type letter (e for powers of 10, b for powers of 2, and x for powers of 16), an optional sign, and an integer. The integer is the power of the base by which the fixed point part is multiplied. If the exponent type is binary or decimal, the exponent is expressed in base 10; if the type is hex, the exponent is expressed in base 16. Integers are bignum. For example, -3.417298777407e+219996 is a valid numeric literal.

Back to [ 4.5 Numeric literals | 4. Basic syntax | Top of Page ]

4.6 The string substitution operator

Braces {} delimit the string substitution operator. The arguments within the braces may be variables, keywords, lists, literals, function invocations, and sequents. Each argument that is not a literal is replaced by its string value; the resulting suite of strings concatenated together as a single string. For example the following loop:

    Foreach i In [1:10]
        x[i] = {"x" i}
        End
creates objects x[1] through x[10] and initializes them with strings "x1" through "x10".

Function invocation expressions are first evaluated and then the string aspect of the return replaces the function invocation expression. If the function invocation expression is an one-dimensional array expression (list) then the string aspects of each item in the list is used. Here is an example: Suppose that M.divqr returns the quotient and remainder of an integer division in a list and that L.sep inserts a separator string between each item in a list. Then:

    Print Q & R = {<L.sep " " lt;M.divqr 5 3>>}
produces the line:
    Q & R = 1 2     

Punctuation characters, symbols, infix operators, arithmetic operators, and boolean operators are treated as literals. Pronouns, e.g., @ and @foo, are replaced by their contained values.

The string substitution operator concatenates strings. For example {"foo", "bar"} produces the string "foobar". It also is useful for evaluating variables within quoted strings. Section 4.8 has an example of its use in boilerplate text.

Back to [ 4.6 The string substitution operator | 4. Basic syntax | Top of Page ]

4.7 Character string literals

San supports two different ways to represent character strings. They are:

  1. Strings using count format:
    Tokens beginning with a integer followed by the letter c (case insensitive) are strings consisting of specified number of characters after the 'c'. Thus 3cfoo is the string "foo". All special characters including white space within the counted number of characters are treated literally. Thus the string 4c{" < is the four characters, {, ", space, and <. Count format does not respect the string substitution operator.
  2. Strings using "":
    Tokens enclosed in double quotation marks are literal strings. Thus "foo" is the string "foo". uoted strings do respect the string substitution operator.
Character string literals cannot extend over line terminators. A quote mark not matched by a terminating quote mark within the same line is a syntax error. Similarly a count format string that is prematurely terminated by an EOL is a syntax error.

Back to [ 4.7 Character string literals | 4. Basic syntax | Top of Page ]

4.8 Text blocks

Text blocks are an alternative method for specifying strings that is particularly useful for text running over more than one line. The format is:
    Text <name>
        <body>
where <name> is the name of the variable containing the text, and <body> is one or more lines of text. Each line has a leading character (after the indentation) that may either be | or :. Lines beginning with : are taken literally as is - no text substitution is performed. Text substitution is done in lines that begin with |. Text blocks do not require an End keyword. For example:
    Text blatifu
        |Dear {addressee}
        |
        |   Please find enclosed a check for {amount} lire.
        |
        .#Ralph 124c41+  {:-}
Text substitution is done on {addressee} and {amount} but not on {:-}. The comment initiator, #, is not recognized as starting a comment within the text of the text block.

Here is another example of using text blocks. In this case we have a routine that produces a version of the "ninety-nine bottle of beer on the wall".

Proc beer
    Text verse
        |{n} bottles of beer on the wall
        |{n} bottles of beer
        |if one of those bottles should happen to fall
        |there would be <E n-1> bottle{s} on the wall
    Text final
        |one bottle of beer on the wall
        |one bottle of beer
        |if that bottle should happen to fall
        |there won't be any beer on the wall
    Foreach n In [99:3]
        Init s = "s"
        Print {verse}
        End
    n=2; s = ""
    Print {verse}
    Print {final}
String substitution is redone each time a text block is used. The text is not terminated with an EOL (end of line) mark; in this example the Print commands supply the final EOL. <E n-1> uses the E pseudo object. It evaluates an expression as its argument and converts it to the context type.

Back to [ 4.8 Text blocks | 4. Basic syntax | Top of Page ]

4.9 Multi-line statements

Ordinarily each statement occupies a separate line. Sometimes, however, a statement is so long that it is preferable to have it extend over several lines. San permits line continuation with incomplete expressions. If a line ends with a binary operator (not counting a terminating comment or terminating white space) the next line is treated as a continuation of the current line. The binary operators that force continuation are: , = & | + - / * and ^. Here is an example:
   distance =
      <M.sqrt x^2 + y^2 +z^2 > /
      <M.sqrt 1. - c**2>
In styles where indentation is required continuation lines must be indented with respect to the first line in in the statement.

Back to [ 4.9 Multi-line statements | 4. Basic syntax | Top of Page ]


Return to the top of the page.

5. Data Types

5.1 Scalar Values
5.2 Executable elements
5.3 Sequents
5.4 Objects
5.5 Symbols
5.6 Pronouns and names
5.7 Expression types

The San language has five categories of data elements, scalar values, executable elements, symbols, sequents, and objects.

5.1 Scalar Values

In principle all scalar values are strings. Some strings are also numbers, e.g., the string, "123.4", is also the number 123.4. And some strings are also boolean values, e.g., the string "true" is the boolean value true. Single characters are also considered to be strings.

Whether scalar values are treated as simple strings, as numbers, or as boolean values depends on the usage context. There are three usage contexts, string, numeric, and boolean, corresponding to the three usage modes.

Back to [ 5.1 Scalar Values | 5. Data Types | Top of Page ]

5.2 Executable elements

There are three varieties of executable elements, procs, sequent operators, and agents.

The term "proc" covers both procedures and functions. A proc can be invoked as a procedure or as a function. When a proc is invoked as a procedure arguments are passed by copy down, copy back, i.e., in the invoking element the arguments are copied to the argument vector, are altered by the proc, and then copied back into the invoking element. When a proc is invoked as a function the arguments are copied down but not back. Here is an example:

        foo bar         <# procedure call, foo can change bar>
        <foo bar>       <# function call, bar is unchanged>
Procs do not hold persistent data.

Sequent operators only appear in sequents (see below). They expect an input stream from input port 1 and produce an output stream on output port 1.

Agents are the roots of independent mini-threads of control. Agents cannot be invoked; they can be sent messages, they can process input from input streams, they can handle events and exceptions, they can be activated, but they cannot be invoked. Agents can and typically do hold persistent data.

Back to [ 5.2 Executable elements | 5. Data Types | Top of Page ]

5.3 Sequents

Sequents are un-named list expressions. The expression consists of a source and zero or more list operators. Sources generate lists; operators accept lists as inputs and produce an output list. The sequent expression represents the list of values that would be generated by first producing the source list and then successively applying the operators.

Sequents are delimited by brackets, i.e., []; the source and the operators are separated by vertical bars, i.e., |. Here are some sequents:

        [1:n]                      <# integers from 1 to n>
        [1:n | Q.step 2]           <# odd integers from 1 to n>
        [1:n | Q.reverse]          <# integers from n to 1>
        [source | op1 | op2 ]      <# general form with two ops>
Predefined operators, e.g., Q.step and Q.reverse, are built-in library routines and are prefixed with the pseudo-object Q. In addition, user defined sequent operators can also be defined. Sequents are discussed in Section 13.

Back to [ 5.3 Sequents | 5. Data Types | Top of Page ]

5.4 Objects

San objects can be thought of as "swiss army knife" objects. A San object is a data element comprised of two parts. The first part is a suite of scalar values and executable elements; the second part is a suite of subobjects. Objects are discussed in detail in section 6. - San Objects

Back to [ 5.4 objects | 5. Data Types | Top of Page ]

5.5 Symbols

An identifier with all letters in upper case that is at least two characters long is called a symbol. Symbols are values in their own right. Symbols can be used as values for attributes and as enumerated types. For example we might set the color of a car with:
        car.color = RED
Symbols can be used as indices for arrays. For example:
        employee[ID2304712].salary = 20000.00
Although symbols can be used as indices they are not integers.

Back to [ 5.5 Symbols | 5. Data Types | Top of Page ]

5.6 Pronouns and aliases

The at sign, @, is a pronoun. In assignment statements it stands for the target of the assignment. Here are some examples:
    foobar = @+1           # increment foobar
    dumble = @*(@+1)       # multiply dumble by itself plus one
    list[] = @, alpha      # append alpha to list[]
An alias is a simple identifier prefixed with an at sign. When an alias appears within an object identifier, its current value is substituted into the object identifier. Aliases are not dimensioned. They can be created in two different way, as an index variable in a loop, and in an assignment statement. Here are examples of each:
    Foreach @x in foo.[]
        Print foo.@x = {foo.@x}
        End
        
    If (a gt? b) @x = a
    Else         @x = b
    Print @x is the larger of a and b
The value of an alias is a name rather than a reference. In this code:
    @x =     a
    foo.@x = 1
    @x     = 2     
the two objects, foo.a and a are set to 1 and 2 respectively.

Back to [ 5.6 Pronouns and aliases | 5. Data Types | Top of Page ]

5.7 Expression types

San identifiers are bound to objects instead of being bound to variables. In consequence identifiers do not have an intrinsic type. Associated with an object is a vector of typed values called aspects. When an identifier appears in an expression the aspect that is used depends on the expression type, the expression type being determined by the syntax.

The expression types are:

Back to [ 5.7 expression types | 5. Data Types | Top of Page ]


Return to the top of the page.

6. San objects

6.1 Usage rules
6.2 Type aspects
6.3 Dimension specification aspects
6.4 User defined aspects
6.5 Anonymous objects
6.6 The self object
6.7 System pseudo objects
6.8 Object signatures

In San all identifiers are bound to objects, and, conversely, all objects (except anonymous objects, see section 6.5) are bound to identifiers. Objects only exist for the duration of their corresponding identifers. All data in a San object is public; i.e., it can be read and altered by functions/procedures (aka procs) within the object and by procs having access to the object. Data within San objects can be public because the scope within which the object exists is limited.

We shan't go far wrong if we think of a San object as a particular species of object, comprised of values and sub-objects. The values are called aspects; the sub-objects are called fields. The components of a object can be referenced with qualified names; the separator between parts is a single quote for spects and a period for fields. Thus foo'a is an aspect and foo.b is a field. A field is not a different kind of object; rather it is an object that is part of another object. Fields can have their own fields in turn, so that a object can potentially be a tree of objects.

So now we know what a object looks like - it is comprised of a vector of values and a tree of sub-objects.

Objects can be arrays; the number of dimensions can be arbitrarily large. Each element in the array has the same shape, i.e, the same number of aspects with the same names, and the same number of sub-objects also with the same names.

6.1 Governing rules of usage

  1. Identifiers always refer to objects. Exception: identifiers qualified with the single quote (') refer to object aspects.
  2. When evaluated, expressions are always comprised of values, i.e., scalar values (strings, numbers, or booleans) and executable elements.
  3. During evaluation of expressions, identifiers bound to objects are always replaced by the object's aspect that corresponds to the usage context.
  4. During assignment, values produced in expressions are bound to the default aspect of the target object that corresponds to the usage context.

    For example, in processing the statement

    		x = x + 1
    
    we first determine that (x + 1) is an arithmetic expression, establishing that the usage context is arithmetic. The default scalar value of x is interpreted as a number. The literal, 1, is also interpreted as being a number. The two numbers are added together to obtain a resulting value. The resulting value is then bound to the scalar value aspect of the object, x.

    In traditional procedural languages one can think of variables decaying to values during evaluation and being promoted to variables during assignment. The same thing on a larger scale occurs in San:

    	evaluation: object -> aspect -> value
    	assignment: value -> aspect -> object
    
  5. Identifiers and objects exist within (are bound to) invocation instances (activation records). In other words the scope of identifiers is local; the objects they are bound only exist during an activation instance.

  6. The contents of a object are visible and alterable from (a) the invocation instance within which they exist, and (b) within invocation instances of the execution elements that are aspects of the object.

Back to [ 6.1 Governing rules of usage | 6. San objects | Top of Page ]

6.2 Type aspects

Default value aspects contain the values to be used when an object identifier appears in a context that calls for a value. They are:
        'string  contains the current value of the string aspect
        'numeric contains the current value of the numeric aspect
        'bool    contains the current value of the boolean aspect
        'vproc   contains the current value of the proc value aspect
        'xproc   contains the current value of the proc executable aspect
        'seqop   contains the current value of the sequent operator aspect
        'agent   contains the current value of the agent aspect
        'symbol  contains the current value of the symbol aspect
        'type    contains the current value of the type aspect
        'chars   contains the string aspect as an array
        'nchars  contains the numbers of characters in the string
The numeric and bool aspects can also be represented as strings. When an assignment is made to the numeric aspect the string aspect is set to the formatted number, the conversion only being done when needed. A similar conversion is done when an assignment is made to the booolean aspect. The string aspect can be accessed as an array of characters ('chars); the length of the array is specified by ('nchars).

When an assignment is made to the string aspect, the numeric and boolean aspects are set to the string if it represents a number or a boolean value. If the string is not suitable for conversion the numeric aspect is set zero and the boolean aspect is set false. Symbols can be converted to strings by explicitly assigning the symbol to a string aspect, e.g.

    vehicle'string = OMNIBUS

The value in the symbol aspect can be restricted to be from a restricted suite of symbols. The set of valid symbols is specified by the aspect array symbol_set. If the array is empty all symbols are valid. In this example

    americanflag'symbol_set[] = RED, WHITE, BLUE
we can say
    americanflag = RED
However
    americanflag = GREEN
would raise an exception.

The vproc, xproc, seqop, and agent aspects hold executable elements. San procs are executable elements that may either be used as procedures or as functions; procs are described in section 11. There are two proc aspects, vproc and xproc. The vproc is the value returned when a proc is expected; the xproc is the proc to be used when the object is executed. Agents are mini-threads of control; agents are described in section 12. Sequent operators are used in sequent expressions as described in section 13.

The type aspect is used to determine which type to use when the context does not determine the type. For example in a simple assignment

    x = y
a value must be selected for the right hand side, but there is no well defined context type. In such cases the type aspect specifies the value type, (string, numeric, or boolean). The type aspect is always set to the last usage type that was explicitly defined for the object.

Assignments can be made explicitly to type aspects. For example,

    foo'string = bar
sets the string aspect of foo to the same value as the string aspect of bar. As a side effect of the assignment, the type aspect of foo is set to string. We could also have written
    foo = bar'string
and the effect would be the same.

Back to [ 6.2 Default value aspects | 6. San objects | Top of Page ]

6.3 Dimension specification aspects

San objects can be dimensioned arrays. The dimensioning is described by one aspect specifying the number of dimensions, and four arrays indexed on dimension number that collectively specify each dimensions bounds and step. The dimension specification aspects are:
    'dnum       contains the number of dimensions
    'dfirst[]   contains the first index for each dimension
    'dlast[]    contains the last index for each dimension
    'dstep[]    contains the index step size for each dimension
    'dspec[]    contains the spec string for each dimension
A dimension spec string either has the form F:L:S or the form F:L where F is the first index, L is the last index, and S is the step size. If S is missing the step size is 1.

Dimension specification aspects cannot be directly altered even though San arrays are resizable.

Back to [ 6.3 Dimensioning aspects | 6. San objects | Top of Page ]

6.4 User defined aspects

User defined aspects can be defined by explicit or implicit assignment. For example, the code

    bagel'sort = sort'xproc
creates (if it doesn't already exist) a user defined aspect of type proc for object bagel, and sets the value of that aspect to be the value of the proc aspect of sort.

The type of a user defined aspect is set when the aspect is defined. Once defined it cannot be altered.

Back to [ 6.4 User defined aspects | 6. San objects | Top of Page ]

6.5 Anonymous objects

An anonymous object is an object that is not bound to an identifier. Anonymous objects are implicitly created when an object is needed but only a value is available. The main situation where anonymous objects are needed is when arguments to a function are literals or are the result of infix operators. For example, consider
    <some_function n+1>
The proc, some_function, expects an object as an argument (see section 11.3 for proc argument passing) but n+1 is a numeric value. The conflict is resolved by creating an anonymous object to hold the value created by n+1 and passing that to some_function.

A similar problem occurs when an aspect is passed to a proc. Here we need an anonymous object to have the passed aspect. For example, consider

    <some_function fumble, bumble'sort>

The aspects of anonymous objects all have default values except for the aspect being set. When an anonymous object is created for a value the aspect corresponding to the value type is set to the value. When one is created for an aspect the aspect in source object is copied to the anonymous object.

Back to [ 6.5 Anonymous objects | 6. San objects | Top of Page ]

6.6 The Self object

The "Self" identifier refers to the current object. For example consider:

    Proc foo
        Self'bool = <bar>'bool;
        End
Whenever foo is invoked its boolean aspect will be determined by invoking bar and returning bar's boolean aspect.

Back to [ 6.6 The self object | 6. San objects | Top of Page ]

6.7 System pseudo objects

Some keywords are system pseudo objects. By this we mean that they appear to be qualified identifiers. For example the emit command can be optionally qualified with a port number and a formatting type. In addition to individual commands there is a small suite of pseudo objects that are used to group system commands without polluting the name space. These include:

   C    # configuration supplied elements
   D    # dimensionality changes
   E    # eval and convert type
   F    # file system commands
   G    # general language functions 
   L    # list functions
   M    # math functions
   Q    # sequent operator commands
   S    # character string functions
   T    # table operations

Back to [ 6.7 System pseudo objects | 6. San objects | Top of Page ]

6.8 Object signatures

The object signature is the suite of aspect values and the vector of field names. Two objects have the same signature if they have same suite of aspects with each aspect having the same value, and they have the same vector of field names. The objects bound to the field identifiers need not have the same same signature.

Back to [ 6.8 Object signatures | 6. San objects | Top of Page ]


Return to the top of the page.

7. Code segment structure

7.1 Dictionary statements
7.2 Initialization statements
7.3 Initialization blocks
7.4 Definition blocks
7.5 Visibility, initialization, and scope
7.6 Statements

The top level of a code segment does not contain executable code as such. Rather, the top level consists of initialization statements, initialization blocks, dictionary statements, and definition blocks.

7.1 Dictionary statements

Dictionary statements have the form
        dictionary: <dictionary name>
Dictionary statements specify external locations to search for definitions not contained within the segment. See the configuration segments section for more details

Back to [7.1 Dictionary statements | 7. Code segment superstructure | Top of Page ]

7.2 Initialization statements

Initialization statements within blocks (all blocks containing executable code) are executed when the block is activated. Code segments are activated when an agent within the segment is activated as a main program. They are only executed once.

Initialization statements have the form

        Init <statement>
The <statement> may be any legal single statement. The normal usage, however, is to create default values for objects. Consider the following code fragment:
        Code foobar
             Init bageldorf = C.framistan
             ...
             Proc doodle
                  Arguments: arg
                  Return arg + bageldorf
                  End
             ...
             End
Whenever doodle is called, it returns the sum of its first argument and the value of the configuration variable, C.framistan.

Back to [7.2 Initialization statements | 7. Code segment superstructure | Top of Page ]

7.3 Initialization blocks

Initialization blocks are blocks of initialization code. They are more general than simple initialization statements because they can contain conditional code.

Initialization blocks have the form

    Init 
        <initialization>
        End init
Here is an example of an initialization block
    Init
        If (Eexecmode eq? Shell)
            console_editing = True
            keep_history    = True
            End if 
        End

Back to [7.3 Initialization blocks | 7. Code segment superstructure | Top of Page ]

7.4 Definition blocks

Definition blocks describe the executable code. There are five kinds of definition blocks. These are: The <name> can either be a simple identifier or a qualified identifier. The qualifier may either be a field or an aspect (exception: objects cannot be defined as aspects).

<definition_arguments> are objects supplied by the defined code. Within the defined object they are initialized to have the same values as they have in the defining code at the time the definition is executed.

Definition blocks always create object definitions. A proc definition block for a proc named foo defines an object named foo with the xproc aspect set to the proc named foo. For example the following three definitions are equivalent:

    Proc foo
        Return 0
        End
        
    Proc foo'xproc
        Return 0
        End
and
    Object foo
        Proc 'xproc
            Return 0
            End
        End
This rule holds for all definitions of executable elements, i.e., procs, sequence operators, and agents. The essence is that the definition does two things; one is two create an anonymous definition of an executable element, and the other is to bind the executable element to an identifier.

Back to [7.4 Definition blocks | 7. Code segment superstructure | Top of Page ]

7.5 Visibility, initialization, and scope

When a code segment is referenced the initialization code and the definitions are executed. Execution of a definition establishes the parse tree and the identifiers referenced in the object being defined. However no code is executed; in particular initialization within a definition is not executed. The identifier named in the definition is defined but not active. It becomes active when a reference is made to the identifier. Activation creates an instance of the object being defined.

When an object is defined all identifiers within the object are tentatively marked as being uninitialized. Once they are initially marked identifiers passed in as definition arguments are bound to copies of the objects in the parent code. For example in this code:

    Proc beta y
        Return x+y
        End
The definition establishes two identifiers, x and y. Object y is initialized to be a copy of y from the enclosing execution environment. Object x is marked as being uninitialized, i.e., it is marked as being a copy of the empty object.

When the object is activated by a reference it is initialized. The initialization process is governed by the fundamental rules of visibility. Objects in the enclosing execution environment are visible for the purpose of initialization at the time of initialization; what happens is that objects in the enclosing execution environment can be copied into the object being initialized. For example consider:

    Proc rho
        Init x = 1
        Init y = 1
        Proc sigma
            Return x+y
            End
        Return <sigma>
        End
When rho is defined x, y, and sigma are not initialized; that only happens when rho is activated, presumably when it is first invoked. When rho is activated the init statements are executed and x and y are set equal to 1. Init statements (and init blocks) are only executed once, i.e, during the activation phase.

Execution of rho then begins. A definition for sigma is made; however it is not yet initialized. Sigma is a local object within rho; it not visible from the outside of rho or within the fields of rho. However it is visible within the executable aspects of rho.

The next statement, the return statement, invokes the executable proc aspect ('xproc) of sigma. This causes sigma to be initialized. Sigma's x and y were marked as unitialized at definition time. During initialization of sigma the enclosing environment (rho and its parents) is searched for objects named x and y. They are found in rho; the x and y objects in sigma are initialized to be copies of rho's x and y.

Back to [7.5 Visibility, initialization, and scope/a> | 7. Code segment superstructure | Top of Page ]

7.6 Statements

Text blocks contain text. Blocks other than text blocks contain statements and/or sub-blocks. San has four basic types of statements, assignments, commands, flow control statements, and sequent statements. Assignment statements have the form

    <destination object(s)> = <value expression(s)>
where <value expression(s)> is one or more complete expressions yielding values. Commands have the for
                  
    <command>  <value_expression_list>
where <value_expression_list> is a list of value_expressions or a sequent. For example,
     2, 3, 5
is a value_expression_list.

Back to [7.6 Statements | 7. Code segment superstructure | Top of Page ]


Return to the top of the page.

8. Expressions

8.1 Type rules for infix operators
8.2 Resolving chains
8.3 Type conversion

San supports scalar, list, and array expressions. For the sake of convenience we shall illustrate them using arithmetic expressions. Here is a scalar expression:

        x + 1               # add one scalar value to another 
It says, simply enough, that the value of the expression is the sum of the scalar value of x and the literal number 1. List and array expressions are more interesting. Here are the two basic forms of array expression:
        foo[] + 1            # Add 1 to each component of foo
        foo[] + bar[]        # Adds corresponding elements of foo and bar
When lists and arrays are being combined they must be comformable, i.e., all of the dimensions must be the same. If they aren't an exception, array-comformation-error will be raised. Object field vectors can be used in dimensioned expressions provided that they are comformable. Thus we can say:
        foo[] + bar.[] + 1
p> Value_expressions are typed. The basic types are numeric, boolean, string, executable, and indeterminate. The indeterminate type signifies that there is insufficient context to establish the type. (See section 6.2 for details.) Value_expressions are either atomic or composite. The atomic value_expressions are: Composite value_expressions are formed using unary prefix and binary infix operators operating on value_expressions. For example, x+y is a composite expression with + as the binary operator operating on x and y. Once binary operators are permitted in a language there must be associated parsing rules and type conversion rules. For example, how do we interpret x+y&z? The important thing to keep in mind is that infix operators are syntactic sugar for function calls. Thus x+y&z is equivalent to one of these two expressions depending upon how it is parsed:
    <+ x <& y z>>       #  i.e., x +(y&z)
    <& <+ x y> z>       #  i.e., (x+y) & z
The function that a infix operator represents can either be a user defined function, a San intrinic function, or language defined infix operator. The intrinsic functions are the numeric comparison functions, eq, ne, gt, lt, ge, and le, and the string comparison functions, ceq, cne, cgt, clt, cge, and cle. The language defined infix operators are the boolean operators, &, and |, and the arithmetic operators, -, +, *, /, and ^.

8.1 Type rules for infix operators

Both arguments to an infix operator must be of the same type or must be convertible to the same type. The arguments for infix operators created from user defined are treated as being of unspecified type. The value of the infix expression either is unspecified if the operator was formed using the exclamation point (!) and boolean if the operator was formed using the question mark (?). In other words

    (x func! y)  is the same as  <func x y> and  
    (x func? y)  is the same as  <func x y>'bool

Both arguments to the numeric comparison operators must either be numbers or convertible to numbers. The result will be a boolean truth value. Similarly the arguments to the string comparison operators must be strings or convertible to strings.

The boolean infix operators (&, |, and ^) expect boolean arguments and return a boolean value. Similarly the arithmetic operators (-, +, *, /, and **) expect numeric arguments and return a numeric value. As with comparison operators the arguments must either be of the proper type or convertible to the proper type.

Back to [ 8.1 Type rules for infix operators | 8. Expressions | Top of Page ]

8.2 Resolving chains

The default rule is that an infix expression must be enclosed in parentheses. Thus (x+(y&z)) is a valid composite expression. Under certain circumstances parentheses may be removed. When they are the result is a chain of infix operations. For example, x+y&z is a infix chain. Chains are permissable when there is an appropriate resolution rule. The resolution rules are:

Back to [ 8.2 Resolving chains | 8. Expressions | Top of Page ]

8.3 Type conversion

Composite expressions can contain type mismatches that can be resolved by type conversion. For example in the expression (x+y)&z (x+y) is a number. It is necessary to convert it to be a boolean so it can serve as an input for the "and" operation. Here are the conversion rules.

Back to [ 8.3 Type conversion | 8. Expressions | Top of Page ]


Return to the top of the page.

9. Assignment

9.1 Assignment syntax
9.2 Evaluation order
9.3 Assignment of executable elements
9.4 Object cloning
9.5 Declarative assignment
9.6 Alias assignment

San has a variety of forms of assignment statements. The major categories, which are syntactically distinct, are direct assignment, and scatter assignment.

Within these categories there are variants depending on the mode of the target and of the assignment expression. San recognizes three basic modes of assignment expression - scalar, vector, and array.

9.1 Assignment syntax

9.1.1 Scalar assignment
9.1.2 List assignment
9.1.3 Array assignment

The basic form of the assignment statement is:

    <destination object(s)> = <value expression(s)>
The left hand side (LHS) of the assignment can be any of
    (a) A single object       (scalar) 
    (b) A list of objects     (list)
    (c) An array              (array)
The right hand side (RHS) of an assignment can be any of
    (a) an single expression  (scalar)
    (b) list of expressions   (list)
    (c) an array  expression  (array)
Only five of nine possible combinations of LHS and RHS are valid. They are:
    Single object    = single expression
    List of objects  = single expression
    Array of objects = single expression
    List of objects  = list of expressions
    Array of objects = array of expressions

Some examples:

     x    = y + 1            # Scalar = scalar
     a,b  = 0                # List   = scalar
     u[]  = 0                # Array  = scalar 
     a[]  = x,y,z            # List   = list
     u[]  = v[]              # Array  = array
 

9.1.1 Scalar assignment

When the RHS is a scalar expression all items on the LHS are set to the value of scalar expression. Examples:
    a[]   = 1                # Each element of a is set to 1
    u[1,] = 0                # All elements of row 1 of u are set = 0

Back to [9.2.1 Scalar assignment | 9.2 Assignment syntax | 9. Assignment | Top of Page ]

9.1.2 List assignment

List assignment differs from scalar and array assignment because there is more than one identifier in the left hand side and more than one expression on the right hand side.

The expressions in the right hand side can be a mixture of vectors, single elements, and sequents. The left hand side is a list of scalar objects optionally terminated with an open array expression. The values in the right hand side are assigned one by one to the objects on the left side. If there are fewer values in the source list than there are objects in the destination list the surplus objects are set to default values. If there are more values in the source list and the last object is not an open array object the surplus values are discarded. If the last object is an open array object it receives the surplus values as a vector. Here are examples:

    a, b, c, d = 1, 2, 3          # Set a=1, b=2, c=3, and d=0 (default)
    a, b       = 1, 2, 3          # Set a=1 and b=2. Value 3 is discarded
    a, b[]     = 1, 2, 3          # Set a=1, and b a vector containing 2 and 3
    a, b, c[]  = 1, 2             # Set a=1, b=2, c=0 (convert to scalar)
Back to [9.2.2 List assignment | 9.2 Assignment syntax | 9. Assignment | Top of Page ]

9.1.3 Array assignment

In array assignment the dimensions of the target (LHS) are adjusted to conform to the dimensions of the dimensioned expression on right hand side (RHS). See section 3.4.2 for details of array assignment.

Back to [9.1.3 Array assignment | 9.1 Assignment syntax | 9. Assignment | Top of Page ]

9.2 Evaluation order

The fundamental rule is that evaluation order does not matter. To illustrate consider the following code fragment:
    Proc foo
        Self = Self + 1
        Print {Self " "}
        End
    foo = 1
    <foo> = <foo> + foo ; Print {foo}
What does it do, and what are the rules governing the evaluation? There are three rules of evaluation that cover the behaviour:

Here is what happens and the order in which it occurs. The definition of proc foo creates an object named foo and foo's executable proc aspect. The numeric aspect of foo is 0; all unspecified aspects get their default values. Each time foo's executable proc is executed the numeric aspect of foo is incremented and then the numeric aspect is printed, followed by a space.

Next we set set the numeric aspect of foo to be 1.

The line, "<foo> = <foo> + foo ; Print {foo}", consists of two statements

    <foo> = <foo> + foo
    Print {foo}
In the first of these statements there are three copies of foo; each is evaluated and executed separately.

The first thing to be evaluated is <foo> in the RHS. In this copy foo'xproc is executed. This sets foo'numeric to 2 and prints "2 ". Finally, the value of <foo> is determined to be its 2, foo's numeric aspect.

The second thing to be evaluated is the final "foo". Since this is a separate copy, its numeric aspect is still 1; this is its value. The two values, 2 and 1, are added together to get the value of the LHS, 3.

Next LHS is evaluated. Again foo'numeric is set to 2 and "2 " is printed. Finally the foo'numeric is set equal to 3 by the assignment. The Print statement completes the printed line, which is:

2 2 3

Back to [ 9.2 Evaluation order | 9. Assignment | Top of Page ]

9.4 Assignment of executable elements

Executable elements can be copied (assigned) and swapped in much the same way the same way as scalar values are, with the proviso a function aspect be identified. For example:
        foo'sort = sort
copies the returnable function aspect ('vproc) of proc sort into the sort aspect of proc foo.

Back to [ 9.3 Assignment of executable elements | 9. Assignment | Top of Page ]

9.4 Object cloning and assignment copying

Assignment statements copy values into the data fields and specified aspects of a object. To do a complete copy (cloning) one must use the set command, to wit:
        set foo = bar
The set command can be used with any kind of assignment with the effect that the assignment operates on whole objects rather than values. For example, the statement:
        set foo[] = alpha
initializes each foo object in the foo array to be the alpha object. When lists or arrays are being combined or distributed their object signatures must be consistent. For example, the statement:
        set foo[] = @, alpha
appends the object alpha to the list of objects, foo. This statement will raise a run time exception if the field vectors of foo and alpha are not the same, or if their aspect suites are inconsistent.

Object oriented programming can be encompassed by using cloning and the object template model.

Back to [ 9.4 Object cloning | 9. Assignment | Top of Page ]

9.5 Declarative assignment

A declarative assignment is an assignment of a value that is made as a declaration. It is not an executable statement as such. The form is:
<target>: <value_list>
The <value_list> typically is a single value, except in the case of the arguments: declaration, where it is a calling sequence list. Style statements are declarative assignments. In program segments the entry and configuration declarations are declarative assignments. In configuration segments program parameters are specified with declarative assignments. Within code segments declarative assignments can be used as initialization statements.

Back to [ 9.5 Declarative assignment | 9. Assignment | Top of Page ]

9.6 Alias assignment

Alias assignments are used to create aliases. The left hand side of an alias assignment is an alias, i.e., a simple name prefixed with an at (@) sign. The right hand side (RHS) can either be an identifier or an identifier valued expression. Here is a simple example:

    @x = alpha
The identifier on the RHS can be a subscripted qualified identifier. Here are some examples:
    @x = a.b
    @y = c[]
    @z = d[i+3]
When an alias is used it is replaced at run time by the identifier that it represents. For example, the statement
    @z.@x = @*@
squares the numeric content of d[i+3].a.b

The RHS identifier can only be a function invocation if the function is one that can deliver an identifier. There are two system procs that can do this, G.cond, and G.select. The first of these, G.cond, has three arguments; the first is a boolean value and the second and third are expressions. The G.cond function returns the second argument if the boolean value is true and the third when it is false. In ordinary usage the G.cond function returns a value that in due course is promoted to an object. However the RHS in an alias assignment cannot be an anonymous object. The implication is that the choices (the 2nd and 3rd arguments) in G.cond have to be identifiers. Thus we can say

    @x = <G.cond (x lt? y) x y<>
but not
    @x = <G.cond (x lt? y) 2 3<>
The situation is similar with G.select. The first argument is an index into the remaining arguments. That is, when the first argument is one, the first of the remaining arguments is returned, etc. In an alias assignment the arguments must be identifiers.

Back to [ 9.6 Alias assignment | 9. Assignment | Top of Page ]


Return to the top of the page.

10. Flow Control

10.1 Exitif
10.2 Conditional code chains
10.3 Switches
10.4 Foreach loops
10.5 While loops
10.6 Simple loops

Ordinary code consists of a sequence of simple statements that are sequentially executed. Flow control statements alter the sequence in specified ways. San recognizes four basic categories of of flow control statements. These are: Escape constructs, block selection forms, loops, and exceptions. This section describes escapes, block selection, and loops. Exception handling is discussed in section 15.

In San escape statements are statements that transfer control out of the current code block being executed. San has four escape verbs. San has one escape statement, the Exitif construct, for escaping from the current block to the next statement after the current block. It also has three constructs for terminating/suspending the execution of the current execution element, Return, Fail, and Yield, that are described in section 11.6

Block selection forms are used to conditionally execute a subset of blocks from a set of blocks. San has two forms of block selection. The first is the alternation sequences, a more general form of the familiar if/elseif/else sequences. The second is the Switch, which uses the value of an expression to select which block of a set of blocks to execute.

Block selection constructs are used when we want to execute a few blocks of code out of a suite of blocks. Loops are used when we want to execute a single block many times. San provides three types of loops, the Foreach loop, the While loop and the "forever" Loop.

10.1 Exitif

The Exitif statement conditionally transfers control out of the current block if its argument is True. The syntax is:

    Exitif <boolean_expression>
where <boolean_expression> is an expression that can be interpreted as a boolean valued expression.

The most common use for Exitif is for exiting loops; however it can also be used with simple blocks. The Exitif statement cannot be used directly from a proc or agent body. Here is an example:

    Loop
        F.readline file line
        Exitif ~ F.readline
        Print {line}
        End
The above code reads lines from a file and prints them until the file is exhausted.

Back to [ 10.1 Exitif | 10. Flow Control | Top of Page ]

10.2 Conditional code chains

San extends the familiar If-Elif-Else conditional code chain with two additional commands, Andif and Orif. Andif and Orif enable more complex conditional chains. They enable splitting compound boolean expressions. They also make it possible to execute more than one block within a conditional code chain. Here is an example:

    If    (car.type   Eq? SPORTY)
    Andif (car.type   Eq? RED)
    Andif (driver.sex Eq? MALE)
    Andif (driver.age Eq? YOUNG)
        Print Fits the profile
    Orif  (car.speed  Gt? 100)
        write_ticket driver

10.2.1 Conditional code units

A conditional command and the code it controls is a conditional code unit. There are If units, Andif units, Orif units, Elif units, and Else units. All of the conditional commands except Else have a parentheses enclosed Boolean expression as an argument. The Else command has no argument. Here is an example of a conditional command:

    If (x Lt? y)
Conditional code units can be either in single line form or block form. In single line form a single simple statement is on the same line as the conditional command. Here is an example of the single line form:
    If (x Lt? min) min = x
In block style the conditionally executed code is in a block that follows the conditional command. Here is an example of block style:
    If (x Lt? min)
        Print min reduced from {min} to {x} 
        min = x
A conditional block is terminated either by an End statement or by a subsequent conditional block in the chain.

Back to [ 10.2.1 Conditional code units |10.2 Conditional code chains | 10. Flow Control | Top of Page ]

10.2.2 Stucture of conditional code chains

A conditional code chain is a sequence of conditional code units. Each chain must begin with an If unit. The initial If unit is followed by a by a mixture of zero or more Andif, Orif, and Elif units in any order. There can be a final optional Else unit.

If a single line form is not followed by a conditional unit or is followed by an If unit the chain is terminated. A single line Else unit also terminates the chain. Otherwise the chain continues. Here are examples of the possible cases:

    If (x Lt? min) min = x
    Print New min = {min}         # Not a conditional unit, chain ends
    
    If (x Lt? min) min = x
    If (x Gt? max) max = x        # If unit starts a new chain
    
    If   (x Lt? min) min = x
    Elif (x Gt? max) max = x      # Elif does not terminate the chain
    
    If   (x Lt? min) min = x
    Else (x Gt? max) max = x      # Else terminates the chain
End statements also terminate the existing chain. Here are examples:
    If (x Lt? min)
        min = x
        End                  # End terminates the chain and the unit
        
    If (x Lt? min)
        min = x
    Elif (x Gt? max)         # Chain continued 
        max = x              # Unit not yet terminated

    If (x Lt? min)
        min = x
    If (x Gt? max)           # Illegal - the prior if unit has not
        max = x              # been properly closed

Back to [ 10.2.2 Stucture of conditional code chains |10.2.1 Conditional code chains | 10.2 Block selection forms | 10. Flow Control | Top of Page ]

10.2.3 Evaluation and execution rules

The conditional code units in a chain are processed in order. When the chain is being processed there is a current boolean value, the B-value, associated with the chain. The B-value is assumed to be false at the beginning of the chain.

There is a general rule that if a conditional code unit's boolean expression is evaluated then the B-value is set to the result. The table below gives the rules for each type of conditional code unit.

Keyword B-value = True B-value = False
If N/A Evaluate the boolean expression.
Execute the conditional code if the result is true.
Andif Evaluate the boolean expression.
Execute the conditional code if the result is true.
Do not evaluate the boolean expression,
Do not execute the conditional code.
Orif Do not evaluate the boolean expression.
Execute the conditional code.
Evaluate the boolean expression,
Execute the conditional code if the result is true.
Elif Do not evaluate the boolean expression.
Exit the chain.
Evaluate the boolean expression,
Execute the conditional code if the result is true.
Else Exit the chain. Execute the conditional code and exit the chain.

Back to [ 10.2.3 Evaluation and execution rules |10.2 Conditional code chains | 10. Flow Control | Top of Page ]

10.3 Switches

The switch construct has an argument and a sequence of cases. The Switch argument is an expression with a value. Each case a non-empty list of values and an action. The values in a case list can either be numeric valued exprssions, string valued expressions, or symbols.

The case value lists are tested in order. A case value list is satisfied if at least one value in the value list is equal to the switch argument. When a satisfied case expression list is found the corresponding action is executed, and the switch is terminated, passing control to the continuation. The list of cases can be terminated by a default Else statment/block. The Else action is always executed if no matching case is found.

Here is the syntax:

     Switch <value> 
         <case_unit>
         ...
         <else_unit>
         End
         
A <case_unit> can either be a single statement or a block. The syntaxes are:
     Case <case_value_list>: <statement>
     Case <case_value_list>: 
         <block_body>
The <else_unit> is similar except that it lacks a case_expression. The syntax alternatives are:
    Else <statement>
    Else Begin
        <block_body>
Switches follow the same block termination rule used by conditional code chains; a conditional code block is terminated by the next Case, Else, or End statement.

Back to [ 10.3 Switches | 10. Flow Control | Top of Page ]

10.3.1 Fallthrough

10.3.1 Fallthrough

Ordinarily of an action block ends when the block terminator is reached. If the last statement of the block is the Fallthrough keyword, execution continues on into the next statement or block. Here is a simple example:

    While (n Ne? 1)
        Switch 
            Case 1:
                n = 3*n+1
                Fallthrough
            Case 0: n = n/2
            End
        End
This code exploits the fact that if n is odd 3*n+1 is even and hence can go directly to the even case without calling the Mod function.

Back to [ 10.3.1 Fallthrough | 10.3 Switches | 10. Flow Control | Top of Page ]

10.4 Foreach loop

The syntax for a Foreach loop is:

    Foreach <var_spec>*
        <loop_body>
        End
where <var_spec>* repesents one or more instances of a <var_spec>, where the var_spec syntax is
    <var> In <value_list>
where <var> is a variable ranging over the values specified in <value_list>. The <value_list> is a sequent. Some examples:
    Foreach i In [1:100]   
    Foreach age In [employee[].age]
In the first example the variable i sequentially assumes the values 1 thourgh 100. In the second example the variable age sequentially assumes the value of the age field of the employee array. When there are multiple var_specs they are nested, with the first var_spec being outermost. For example, here is a matrix multiplication loop:
    Foreach i in [1:3], j in [1:3]
        c[i,j] = <M.dotpr a[i,] b[,j]>
        End

Back to [ 10.4 Foreach loop | 10. Flow Control | Top of Page ]

10.5 While loop

The syntax for a while loop is:

    While (<boolean_expression>)
        <loop_body>
        End
where <boolean_expression> is a boolean valued expression. The parentheses are necessary.

The expression is evaluated at the beginning of each cycle. If the expression is False the loop is terminated; if it is True the loop body is executed and the next cycle begins. Here is an example of a While loop

    While (x Lt ? y)
        x = @+1
        End

Back to [ 10.5 While loop | 10. Flow Control | Top of Page ]

10.6 Simple loops

A forever loop (so called because the construct does not have a termination condition) is a simple Loop statement. The syntax is

    Loop
        <loop_body>
        End
The normal way to terminate a forever loop is with an Exitif statement. If the Exitif is the first statement in the loop body we have a while loop; if it is the last we have a repeat/until loop; and if it is in the middle we have a loop and a half loop. Here is an example of a loop and a half loop.
  
    Loop
        F.readline file line
        Exitif ~ F.readline
        Print {line}
        End

Back to [ 10.6 Simple loops | 10. Flow Control | Top of Page ]


Return to the top of the page.

11. Procedures and Functions (procs)

11.1 Proc definition
11.2 Proc structure
11.3 Argument Passing style
11.4 Specifying arguments in calling sequences
11.5 Variable length calling sequences
11.6 Returning from a proc
11.7 Infix operators as multi-argument functions

Functions and procedures are the work horses of procedural languages. The two forms are closely related, and many languages fuse the two together. In practice the two forms are distinct; functions return values that can appear in expressions whereas procedures do not. As far as definition is concerned, San fuses the two together; Functions and procedures are both instances of a general form called a proc. Procs can either be used as functions or as procedures.

Examples of function and procedures invocation:

        x = <M.cos theta>          # function invocation
        sort data                # procedure invocation
        <foo bar>                # function used as command
Function names appear inside the delimiting angle brackets, <>, in the function invocation. Parentheses, (), are not needed around the arguments in the procedure invocation.

11.1 Proc definition

Procedures and functions are defined using proc blocks. For example:
     Proc factorial
         Arguments: n
         If (n Le? 1) Return 1
         Else         Return n * <factorial n - 1>
         End proc
A proc can either be used as a procedure or as a function.

Back to [ 11.1 Proc definition | 11. Procs | Top of Page ]

11.2 Proc structure

The structure of a proc is:
     Proc <proc_name> <definition_arguments>
          [<arg_stmt>|<interface_stmt>|]
          [<initialization>]
          <proc_body>
          [<epilog>] 
          [<catch_blocks>] 
          End [proc | <proc_name>]      
<definition_arguments> are objects supplied by the defined code. They are initialized within the proc to have the same values as they have in the defining code at the time the definition is performed. See section 7.5 for a discussion of scope and initialization. The definition arguments can be overriden by calling sequence arguments. Thus, if x is a definition argument it creates a default object for x; if x also is a calling sequence argument (and is in the calling sequence) the object passed in replaces the default object.

Each line of the <arg_stmt> consists of the text "arguments:" followed by a list of calling sequence arguments. There can be more than line in the <arg_stmt>. Here is an example:

     Arguments: x       # x cartesian coordinate
     Arguments: y       # y cartesian coordinate
In this example the calling sequence vector is x, y. The alternative way to specify the calling sequence arguments is to reference an interface block. Interface blocks are defined at the top level of a code segment. They contain argument statements and default value statements. For example:
     Interface foo_args
         Arguments: x y title
         Init x = 0
         Init y = 0
         Init title =  "Data plotted on log-log graph paper."
         End foo_args
In the proc the interface block being used is specified with an interface statement consisting of the text "interface:" followed the interface name. For example:
        Interface: foo_args
Either argument statements or an interface statement can be used to specify the calling sequence but not both.

The syntax of init statements and init blocks are described in sections 7.2 and 7.3.

The <proc_body> is the procedure body proper. It consists of zero or more executable statements and blocks.

The optional <epilog> is executed when the procedure body code terminates, either because of a return statement or because there is no continuation. An epilog cannot contain any statements that branch out of the epilog other than raising an exception. Here is an example:

     Epilog
         If Self'failure Raise fooby
         End epilog
In this example the fooby exception is raised if the 'failure flag is set. Epilogs are principally used in procs which have multiple return statements.

<catch_blocks> can be placed anywhere in the proc after the initialization. They handle named exceptions that are raised. Here is an example:

    Catch fooby
        Emit.1 bageldorf
        End catch

Back to [ 11.2 Proc structure | 11. Procs | Top of Page ]

11.3 Argument Passing style

San uses both call by copy and call by reference (actually call by copy-back) depending on whether a proc is used as a function or as a procedure. Function invocations use call by copy, i.e., the objects passed to the function are copies of the objects in the calling environment.

Procedure invocations use call by copy back, i.e., the objects passed to the procedure are copies of the objects in the calling environment. When the procedure returns control to the calling environment the revised call sequence objects are copied back.

Consider the following three alternatives:

        (1) x = <foo bar>        # bar will not be changed 
        (2) <foo bar>            # bar will not be changed
        (3) foo bar              # bar may be changed
In the first case bar is not altered but x may be. In the third case bar may be altered within foo. The second case behaves like a procedure call except that bar is not altered.

Back to [ 11.3 Argument Passing style | 11. Procs | Top of Page ]

11.4 Specifying arguments in calling sequences

When procedures and functions are defined the arguments are specified in an arguments statement. Arguments can have default values specified in init statements. For example:
    Proc foo
        Arguments: x, y
        Init x = 3.141510
        Init y = 180.
        ...
        End foo
San uses positional correspondence to supply arguments when invoking procedures and functions. For example
        foo a b
matches a to x and b to y. Back to [ 11.4 Specifying arguments in calling sequences | 11. Procs | Top of Page ]

11.5 Variable length calling sequences

The list of arguments passed to a proc does not have to contain the same number of objects as were declared in the arguments statement. When there are fewer objects passed than declared the objects not passed retain their prior states.

The length of the passed list is given in the system variable Argc; the passed list is contained in the system variable Argv. (These variables have the correct values for each separate proc invocation.) The only surplus arguments can be detected and operated on is by checking Argc and accessing Argv.

Back to [ 11.5 Variable length calling sequences | 11. Procs | Top of Page ]

11.6 Returning from a proc

11.6.1 Return
11.6.2 Fail
11.6.3 Yield

San has three "return" statements, Return, Fail, and Yield. The term "return" is a bit of a misnomer in San because San procs alter the value of their parent objects rather than returning.

A Return/Fail/Yield statement with arguments is equivalent to an assignment to Self followed by the statement without arguments. Thus:

    Return a,b
is eqivalent to
    Self[] = a,b
    Return
Return and Fail are used for successful and unsuccessful returns. The only difference is that Return sets the boolean aspect of Self True whereas Fail sets it False. Each terminates the current invocation of the proc. The Yield statement is similar to the Return statement except that it suspends rather than terminates the execution of the invocation.

11.6.1 Return

The Return statement terminates the current invocation and returns control to the invoking routine. The syntax is

    Return <argument_list> 
where <argument_list> is an optional list of arguments.

The Return statement also sets the boolean aspect of Self to be True unless a boolean value is being returned. The boolean aspect thus serves as a return code signifying success or failure. The Fail statement (below) is equivalent to the Return statement except that the boolean aspect is set to False. In the example in 10.1.1 the F.readline routine sets F.readline True if a line was read and False if it was not.

Back to [ 11.6.1 Return | 11.6 Returning from procs | 11. Procs | Top of Page ]

11.6.2 Fail

The Fail statement is the same as the Return statement except that the boolean aspect of the invoked object is set False. It is typically used in routines that attempt to obtain a resource when the attempt fails. The syntax is

    Fail <argument_list>
where <argument_list> is an optional argument list.

Back to [ 11.6.2 Fail | 11.6 Returning from procs | 11. Procs | Top of Page ]

11.6.3 Yield

A Yield statement is similar to a Return statement with one critical difference; the current invocation is suspended and will be resumed when it is again called from the invoking routine. The syntax is

    Yield <argument_list>
where <argument_list> is an optional list. As with Return and Fail, an argument list is treated as though it were assigned to Self just before a naked Yield. The boolean aspect is set True unless there a boolean valued argument.

Routines that yield control with a Yield statement can be used as generators and as coroutines. Here is an example of a proc that returns the next term in the Fibonacci sequence each time it is called.

    Proc fib
        Init x,y = 0,1
        Loop
            result,x,y = x,y,x+y
            Yield result
            End
        End
The yield statement must be in a loop; if it weren't, fib would terminate when it reached the proc end.

Creating coroutines is a bit more complicated. Here is an example. We have two routines, a producer and a consumer. The producer makes widgets, the consumer consumes them. The code looks like this:

Proc producer_consumer

    Proc producer
        Arguments: consumer
        Self.widget = 
        consumer Self
        Loop
            Self.widget = 
            Yield
            End
        End
        
    Proc consumer
        Arguments producer
        Loop
            consume_widget producer.widget
            Yield
            End
        End
        
    producer
This code relies upon that procedure invocations use call by reference.

Back to [ 11.6.3 Yield | 11.6 Returning from procs | 11. Procs | Top of Page ]

11.7 Infix operators as multi-argument functions

The arithmetic and boolean infix operators (-, +, *, /, ^, &, |) can be used as functions (but not procedures) with a variable number of arguments as long as there are at least two arguments. For example the factorial function n! can be expressed with:

        <* [1:n]>
The function values are computed by applying the operator left to right. For example the expression
 
        <op arg1, arg2, arg3, arg4> is evaluated as
        (((arg1 op arg2) op arg3) op arg4)
This matters for the subtraction and division operators. Thus <- 5, 4, 1> evaluates as ((5-4)-1)=0 rather than as (5-(4-1))=2.

Back to [ 11.7 Infix operators as multi-argument functions | 11. Procs | Top of Page ]


Return to the top of the page.

12. Agents

12.1 Agent structure
12.2 Agent activation and termination
12.3 Connection networks
12.4 Inter-agent communication
12.5 Puts and Print statements
12.6 Sources and sinks

A San program (execution process) consists of agents that are separate, independent threads of control. Unlike threads in many implementations of theads, agents do not share data. One can, if one likes, think of an agent as being a mini-process.

A San program has three levels of agents, the supervisory agent, principal agents, and subordinate agents. The supervisory agent is at the top level. It is created by the San execution engine to supervise the execution process and to act as a parent to the principal agents. The principal agents are directly below the supervisory agnet. Each principal agent is the master for a thread of control. Principal agents can have subordinate agents; these in turn can have their own subordinates, etc. Subordinate agents are always defined within other agents. Each subordinate agent represents a subthread within its parent's thread.

Agents are event driven. An agent may (but need not) have a single backgound thread. In addition to the optional background thread it can have an indefinitely large number of event handlers called on units.

A San program can consist of a single active agent; however San is designed to support dataflow programming. In dataflow progamming programs are viewed as being a collection of black boxes (data flow elements) connected by pipes (streams). A data flow element is a routine with 0 or more input ports and 0 or more output ports. Input ports accept data from input streams; output ports emit data to output streams. Data flow elements are activated whenever there is input on an input port. Block markers to delimit blocks can be accepted and emitted.

Data flow elements that have outputs but no inputs are called sources; sources include external devices, external files, and generators such as random number generators.

Data flow elements that have inputs but no outputs are called sinks; sinks include external devices, e.g., printers, and files that are written to.

In San input and output ports are labelled 1, 2, etc. (Output port 1 is distinct from input port 1.) The current value in an input port is referenced by the port number, e.g., $1. The content of remainder of a block is referenced by appending brackets, e.g., $1[].

Reading the contents of an input port advances the input. The emit.1 command sends data to output port 1; emit.2 sends data to output port 2, etc. Emitting the content of a object clears the content.

In San streams can be blocked. The block separators, called marks, are not special elements in streams; they must be emitted and detected using separate commands distinct from the commands used to read and write streams.

12.1 Agent structure

An agent is an executable code element (routine) that is an independent thread of control, i.e., a mini-thread. Procedures (and functions) are invoked. Agents cannot be invoked; they can be sent messages, receive input on input ports, and handle events and exceptions,.

Agents have two types of code - initialization (setting of parameters) and response blocks (On blocks). Response blocks are activated when a specific condition arises. One or the other of the two types can be omitted. Response block initiators have the form

    On <type> 
        <response_block_body>
        End
where <type> is the type of event being handled. The types include:
Type Description
Stream The keyword word "Stream" can be qualified with a port number. For example,
        On Stream.1
is activated when there is stream input in streaming input port 1. There are two types of input ports, for streaming input, and for blocked input. In streaming input the "stream" response block is activated whenever new input appears. If there is no qualifying port number the response block is activated if there is new input in any port.
Block The keyword word "Block" can be qualified with a port number. For example,
        On Block.1
is activated when there is a block of input in block input port 1. If there is no qualifying port number the response block is activated if there is new input in any port.
Reply The keyword word "Reply" can be qualified with a port number. For example,
        On Reply.1
is activated when there is a reply is response to data sent on port 1. Replies are requested when an rsvp qualifier is used in an emit command. If there is no qualifying port number the response block is activated if there is a new reply on any port.
Exception The keyword word "Exception" can be be followed by zero or more exception objects. For example:
        On Exception foo bar
is executed whenever an exception creating either foo or bar is raised within the agent that is not handled locally. If there are no arguments all exceptions raised within the agent are handled.
Message The message response block is activated whenever there is a message from the parent agent. There are no arguments.

The execution of an on unit continues to the lexical end of the on unit. It can be prematurely terminated with a done statement. As an example here is a fragment from an on unit that processes one line of San source code.

            If (line ceq? "")  done  # flush empty lines
            If <match line "#*"> done  # comments are flushed

Back to [12.1 Agent structure | 12. Agents | Top of Page ]

12.2 Agent activation and termination

The primary "root" agent is specified in the Program segment with an Entry statement. Activating the program activates it. Active agents can contain definitions of agents; however a defined agent is not active until an activation command is issued. Here is an example:

    Agent parent
        Agent child
            ...
            End agent
        Activate child
        ...
        End
The child need not be defined within the parent agent; it can be defined within an enclosing lexical scope. For example
    Code
        Agent prototype_child
            ...
            End
        Agent parent
            Foreach i in [1:100]
                children[i] = prototype_child
                Activate children[i]
                End
            ...
            End
        End
In this code the source code for the child agents is defined outside the parent agent. Within the parent 100 instances of the child agent is created and each is activated.

Active agents can be suspended or terminated by the parent agent. Suspended agents retain their internal data; they can be reactivated by the parent with a resume command. Terminated agents do not retain their internal data. The commands are:

    Suspend   <name>        # Suspends <name> 
    Resume    <name>        # Resumes <name> 
    Terminate <name>        # Terminates <name>
An agent activation command can contain arguments. The syntax for determining and using arguments within an agent is the same as that used in procs, i.e., agents can contain Arguments statements and Interface statments, and can reference Argc and Argv.

An agent cannot directly create a peer agent, e.g., a principal agent cannot create another principal agent. However an agent can send a message to its parent asking it to create a peer agent. Similarly, an agent can ask its parent to make a connection between it and a peer agent.

Back to [12.2 Agent activation and termination | 12. Agents | Top of Page ]

12.3 Connection networks

The children of a parent agent can be connected to each other via pipes. Pipes connect output from one agent to input of another. The connections established by a parent agent form a connection network. Connections are established with the connect statements; the syntax is:

        <connection> ::= connect <pipe_spec> [, <label>]
        <pipe_spec> ::= <source> [<pipe_sym> <agent>]*  <pipe_sym> <sink>
        <pipe_sym> ::= [<port>]'|'[<port>]
where <connection> is a connection statement, <label> is an object to be used a reference to a connection, and <agent> is the identifier for a child agent. Source and sinks (see section 12.6) can either be agents or program elements that can act as sources and sinks. In particular previously defined connections can be used as sources and sinks.

The <source>, <sink>, and the <agent>s can have arguments; the arguments are accessed within an agent via the Argc, Argv mechanism.

Each agent has an unboundedly large number of input and output ports. Ports are specified by port numbers, which are either integers or nil, that being a shorthand for port 1. Pipes can be concatenated on a single line. For example:

        Connect foo 1|2 bar | bagel, label = e105
says that output port 1 of foo is connected to input port 2 of bar, and that output port 1 of bar is connected to input port 1 of bagel. A port can be connected to more than one pipe by using separate connection statements. For example a "tee" is effected with
        Connect foo | bar
        Connect foo | bagel
Connections can be chained together. For example connection e105 above can be extended with
        Connect e105 | dorf
which connects foo, bar, bagel, and dorf.

The suite of connection statements defines the connection network, which in effect, is a directed graph with the agents being vertices and the pipes being edges. The graph can have multiple components and cycles.

The connection network is dynamic rather than static, i.e., new connections can be made during execution and existing connections can be broken. Connections are removed with the disconnect command. The arguments for the disconnect command can either be a <pipe_spec> or a list of one or more connection labels. Thus we could either say

        Disconnect foo 1|2 bar | bagel
        Disconnect e105
We can also partially break a connection. Thus, having made the above connection between foo, bar, and bagel we could remove the connection between foo and bar with
        Disconnect foo 1|2 bar
If we do so, bar and bagel would still be connected.

Agents, sources, and sinks can appear more than once in a network. The output from a given numbered port can be sent to multiple connection elements; however an input port can only receive input from a single connection element.

When the activate, suspend, resume, and terminate commands are issued without arguments they apply to all agents named in the connection network.

Back to [12.3 Connection networks | 12. Agents | Top of Page ]

12.4 Inter-agent communication

12.4.1 Sending data through pipes
12.4.2 Receiving data from pipes
12.4.3 Handling replies
12.4.4 Sending data through messages
12.4.5 Receiving data from messages

Agents communicate with each other in one of two ways, by pipes or by messages. Pipes are used for communicating between subordinate agents at the same level. Messages are used for communicating between parent and child agents.

One of the general principles of the San language is that agents do not know who their peers are or who their parent agent is. On the other hand they do know who their subordinate agents are because they created them.

12.4.1 Sending data through pipes

Data is sent to pipes by "Emit" commands. The emitted data can either be in the form of a character stream, a stream of strings, a list of strings, or a serialized object (object). In addition, blocking markers can be inserted into the stream with "mark" commands.

Emit commands optionally contain a port number and a mode; the defaults are port 1 and character stream mode. For example,

        Emit.1.char foo
sends the characters in the string bound to foo to port 1 as a stream of characters. The permissable dot delimited subfields of the emit command are the port numbers, 1,2,... and the mode specifiers, char, string, list, and obj. The emit command can have one or more arguments.

If the mode is char (this is the default) the arguments are interpreted as strings; the emit command sends the concatenation of the argument strings. If the mode is string, the argument strings are sent packaged as separate strings. If the mode is list, the argument strings are sent packaged as a list of strings. Finally, if the mode is obj, each argument is serialized and sent.

Back to [ 12.4.1 Sending data through pipes | 12.4 Inter-agent communication | 12. Agents | Top of Page ]

12.4.2 Receiving data from pipes

On the receiving end the arrival of data activates an on unit that accepts data from the designated input port. Here is a small example:
        Init count = 0
        On Stream.1.char
            count = @ + 1
            End
This particular on unit does nothing more exciting than count the number of characters received in port 1.
Back to [ 12.4.2 Receiving data from pipes | 12.4 Inter-agent communication | 12. Agents | Top of Page ]

12.4.3 Handling replies

Whenever an agent receives data on an input channel it can issue a reply. It does not need to know (and in fact does not know) where the input came from. The input can be tagged with an rsvp tag. If there is an rsvp tag a reply is requested but is not obligatory. When an on unit is handling input on an input port the 'rsvp aspect will be set true if the input was tagged with the rsvp tag and false otherwise. The Reply command is used to send a reply. Here is an example:
    On Input.1
        If (Self'rsvp) Reply.object "received"
        ...
        End
The syntax for a reply is the same as that of the emit command except that it is not qualified with a port number.
Back to [ 12.4.3 Handling replies | 12.4 Inter-agent communication | 12. Agents | Top of Page ]

12.4.4 Sending data through messages

Message channels, like input and output ports, are designated with integer channel numbers. Channel 0 is reserved for the parent agent. The Sendmsg command is used to send messages to the parent agent and to the child agents. The first argument is an object containing the message. The remaining arguments are the channel numbers of the agents to which the message is to be sent. If there is only one argument, i.e., there are no recipient channel numbers, then the message is sent to the parent agent.
Back to [ 12.4.4 Sending data through messages | 12.4 Inter-agent communication | 12. Agents | Top of Page ]

12.4.5 Receiving data from messages

Arriving messages are captured with On units. The keyword, Message, is qualified with the message channel number. There must be a single argument, the identifier for the object holding the received message. If the Message keyword is not qualified the On unit handles all messages not caught by on units with qualified keywords. The qualifier can be a sequent, in which case the On unit catches all channels specified by the sequent. Here is a simple example:
    On Message.1 text
        Print text
        End
In this example the on unit catches messages from the parent and prints them. Here is the same code except that it catches and prints messages from children 1, 2, and 3.
    On Message.[1:3] text
        Print {text}
        End
Back to [ 12.4.5 Receiving data from messages | 12.4 Inter-agent communication | 12. Agents | Top of Page ]

12.5 Print and Puts statements

Emit commands expect a list of objects as arguments. The Print command expects text as argument; it sends the argument text as a EOL terminated line to the designated output port. In short it is an emit command with a different style of argument processing. For comparison, here are two equivalent statements.

        Emit "Hello world!" Eol
        Print Hello world!
A Print command expects one space after "Print"; spaces after the initial space are printed. The following example is from a quining program (one that prints its own source as output.)
        Print Agent quine
        Print    Text Q
        
which produces the output

        Agent quine
           Text Q  
String substitution enclosure {} and function operator <> are respected; however the function operator cannot be qualified. Here is another comparative example:
        Emit "sin(" x ") = " <sin x> Eol
        Print sin({x}) = <sin x>
The Puts command is used to enter text in the line buffer without terminating it with a newline and emitting it. The above example could have been written
        Puts sin({x}) =
        Print  <sin x>

Back to [ 12.5 Print and Puts statements | 12. Agents | Top of Page ]

12.6 Sources and sinks

The initial element in a connection sequence must act as a source, i.e., it must generate a sequence of outputs without having any inputs. An agent can be structured to serve as a source. Here is a simple example:

    Agent integers
        Init Emit [1:Inf]
        end
This agent generates an infinite sequence of integers, or, more precisely, it generates upon demand as much of the sequence as is demanded. There are two special sources, parent input, and input from a file. Here are examples of each:
    Connect $1 | lexer                # parents input port 1 is used as a source
    Connect F.source infile | lexer   # source file infile is read and passed
                                      # character by character to lexer
"F.source" establishes a source (file to be read); F.sink establishes a sink (file to be written.) Each expects a single argument, the text path name of the file to be read or written. Each can have a qualifier, line. A source, "F.source.line", produces blocked data with each block of output being a line from the specified file. A sink, "F.sink.line", expects blocked data coming in; the input is written as lines (EOL marker added) to the sink file.

The terminal element in a connection sequence must act as a sink, i.e., it must accept input on its input ports but does not produce output on output ports. In principle any agent can be used as a sink, i.e., as a terminal element in a connection network. However the agent's output ports are not connected to anything, so any emitted output will be thrown away. Output in a sink agent can be sent to the parent agent via messages. The alternative to a sink agent is to use either a parent output port or a file as a sink. Here are examples:

    Connect $1 | lexer | $1                            # Source is parent's input port $1
                                                       # Sink is parent's output port $1
    Connect F.source infile | (lexer) | F.sink outfile # Source is from file infile,
                                                       # Sink is written to file outfile.

Back to [ 12.6 Sources and sinks | 12. Agents | Top of Page ]


Return to the top of the page.

13. Sequents

13.1 A grammar for sequents
13.2 Use of sequents in San code
13.3 Primary sources
13.4 Predefined sequent operators
13.5 Boolean expressions as operators
13.6 Arithmetic expressions as operators
13.7 User defined operators
13.8 Sequent statements

The San language supports a programming construct called a sequent. Sequents can be thought of as mini-programs that produce lists (of scalar values). The general model is one of a list (produced by a source) modified serially by a suite of operators that accept an input list, modify it, and emit an output list.

A sequent consists of a sequence expression enclosed within brackets. Sequence expressions consist of a sequence source optionally followed by sequence operators. Sequence sources are recursively built up from a combination of primary sources and sequents.

13.1 A grammar for sequents

This is a quasi-formal description of the grammar for sequents, i.e., enough material so that one specify a formal grammar.
Special characters:

lb:     '['     # Left bracket
rb:     ']'     # Right bracket
vb:     '|'     # Vertical bar

Terms and such:

SRC:    A list source
PS:     Primary source
SO:     Operator
SX:     Sequence expression
SQ:     Sequent

Production rules:

SQ  <- lb SX rb
SX  <- SRC | SX vb OP
SRC <- PS | SQ | SRC SQ | SQ SRC

Back to [ 13.1 A grammar for sequents | 13. Sequents | Top of Page ]

13.2 Use of sequents in San code

Sequents can appear in loop specifiers, e.g., 

        Foreach i In [1:n]
        
They can also appear in assignments, e.g.,

        foo[] = [1:n]
        alpha, beta[] = [1:n]
        
The first sets the number aspects of foo[] to be the integers from 1 to n;
the second sets alpha to be 1, and the number of aspects of 
beta to be the integers from 2 to n.

Back to [ 13.2 Use of sequents in San code | 13. Sequents | Top of Page ]

13.3 Primary sources

Primary sources are either simple lists or sequence generators. There are two major categories of primary sources, sequence special forms, and object vectors. In addition there are also some miscellaneous sources.

Simple lists

The simplest type of primary source is the simple list, e.g.,
[x, y, z]
Note that this list is a list of the (default) values of x, y, and z.
Special forms representing sequences
        (a) integer sequences, e.g., m:n
        (b) semi infinite integer sequences, e.g., m:Inf
        (c) Q.alphseq
        (c) Q.alphgen
The predefined Q.alphseq operator expects strings of the form (letter)-(letter), e.g., "A-Z". It returns a vector of letters from first letter to last letter for each string. For example,
        <Q.alphseq "a-c" "Z-X">[];
returns the sequence of letters a,b,c,Z,Y,X.

The Q.alphgen generator successively generates the lower case letters a...z, the letter pairs aa...zz, etc.

object vectors
        (a) simple vector,  foo[]
        (b) array slice,    foo[i,]
        (c) aspect vector,  foo.[]
The blank field in a object vector may have a range, e.g., foo[[m:n] refers to rows m through n only.
Special generators
        (a) random numbers, Q.randgen
Randgen generates a semi-infinite sequence of random numbers. Specs for randgen are to be determined.
Sources producing semi-infinite sequences
Some sources, e.g., Q.alphseq and Q.randgen, produce semi-infinite sequences, i.e., sequences that have an initial element, but no terminal element. These sequence generators are necessarily evaluated lazily using "just in time" semantics.

Back to [ 13.3 Primary sources | 13. Sequents | Top of Page ]

13.4 Predefined sequent operators

Sequent operators can be divided into three classes, depending on what happens when they are presented with an unboundedly large input stream. These classes are (a) operators that are not well defined for unbounded input streams, (b) operators that are well defined but require O(n) space, and (c) operators that are well defined for all streams and require O(1) space.

The "Q.keep" operator is critical because it converts potentially unbounded streams into bounded streams.

A list of sequent operators is given in Appendix III.

Back to [ 13.4 Predefined sequent operators | 13. Sequents | Top of Page ]

13.5 Boolean expressions as operators

An incomplete boolean relationship can be used as a filter; all entities satisfying the relationship are passed, and those not satisfying the relationship are deleted. An incomplete boolean relationship is one in which one of the pair of entities is missing. For example, the sequent

        [1:5 | Ne? 4]
produces as output the sequence {1, 2, 3, 5}.

Instead of using incomplete relationships one can use $1 to stand for the list item being filtered. Thus our example could have been written:

        [1:5 | $1 Ne? 4]
Similarly, incomplete boolean queries (a boolean query operator has only one argument) can be used as filter operators. For example,
        ["a","b",4 | S.number? ]
produces as output the sequence {4}. As with relationships the missing argument can be filled by $1; our example could have been written
        ["a","b",4 | S.number? $1]
Compund booolean expressions must have complete relationships and queries. For example:
        [1:13, "x", "y" | S.number $1 & $1 lt? 4 ]
produces as output the sequence {1,2,3}

Back to [ 13.5 Boolean expressions as operators | 13. Sequents | Top of Page ]

13.6 Arithmetic expressions as operators

When arithmetic expressions are used as operators, the expression is evaluated for each input, and the evaluated expression is output. In the case of arithmetic expressions the input item must be represented by $1. For example, the sequent
        
        [ 1:4 | (3 * $1) + 1]
yield the sequence {4, 7, 13, 13}

Back to [ 13.6 Arithmetic expressions as operators | 13. Sequents | Top of Page ]

13.7 User defined operators

User defined sequent operators have two input ports, $1 and $2, and one output port, $1. Inputs are presumed to be blocked, i.e., they do not accept semi-infinite input. When invoked in a sequent $1 comes from the piped input and $2 comes from the argument list. The emit command is is used to send output to the output pipe.

Here is an implementation of quick sort as an example of sequent operator programming.

        
    Operator qsort
        pivot   =  $1    
        small[] = [$1 [] | lt? pivot | qsort]
        big[]   = [$1 [] | gt? pivot | qsort]
        same[]  = [$1 [] | eq? pivot        ]
        Emit [small[], same[], big[]]
        End qsort
         
 

Back to [ 13.7 User defined operators | 13. Sequents | Top of Page ]

13.8 Sequent statements

A sequent be used as a statement. A sequent statement is a no-op unless the sequent contains an operator with side effects. Commonly the side effect is either (a) an output command, e.g., Emit or Print, or (b) an operator that alters its object content. Here are examples of each:
        [a[] | Q.step 2 | Emit $1]  # emits every other element of a[]
                                  #
        [a[] | user_op]           # the user_op operator sets an aspect
        y = x + user_op           # of the user_op object
Sequent statements eliminate the need for "boiler plate" foreach loops. Here is an example:
        [a[] | func | Print]
instead of
    Foreach x In a[]
        Print <func x>
        End
which takes three lines instead of one, introduces an unnecessary dummy variable. and obscures the flow of data through functions.

Back to [ 13.8 Sequent statements | 13. Sequents | Top of Page ]


Return to the top of the page.

14. Mathematical operations

14.1 Base selection
14.2 Numeric precision and range
14.3 Mathematical functions

Numbers in San arithmetic are represented by pairs of integers, e.g., (46,-2). The pair (u,v) represents the rational number u*B**v where B is the base. FOr example the pair (46,-2) is .46 in decimal arithmetic.

Most languages implement number types by a variable type system, i.e., each variable has its own individual (or implicitly derived) number type. San uses mode selection instead, i.e. statments that specify the kind of number system being used.

Mode selection is passed downwards but not back up. Thus, suppose that foo calls bar. The default is for bar to inherit the mode selections from foo. They can be changed within bar; however the changed settings are not passed back up. Mode default values can be set in the governing configuration segment.

14.1 Base selection

San supports binary (base 2),decimal (base 10), and hexidecimal (base 16) arithmetic. The choice of base is made with the base statement, e.g.,

    Base: 2
The default base is decimal arithmetic if there is no active base command. Executable elements inherit the base of their activator unless there is an explicit base command in the element.

The base command is a block wide operator. For example:

    Proc alpha
        Arguments: x y
        Base: 2
        z2 = x + y
        Begin
            Base: 10
            z = x + y -z2
            End
        Return z
        End
In this code x and y are added both in base 2 and base 10 and the results are compared. There can be a small loss of precision when converting from one base to another; the sample code tests for that.

Back to [ 14.1 Base selection | 14. mathematical_operations | Top of Page ]

14.2 Numeric precision and range

There are two commands for controlling the precision and exponent range of numbers, precision and exponent. Here is an example:

    Precision: 10
    Exponent_range: -2 32767
These commands say that there will be 10 significant figures retained in the chosen base and that the exponent can range from -2 to 32767. The exponent_range can have a single argument; if it does, the range will be from minus the arguyment to plus the argument.

The precision and the exponent_range can have Inf as an argument. Setting the range to Inf says that the range can be arbitrarily large. Setting the precision to Inf says that no digits will be dropped in multiplication, addition and subtraction. In division, the precision will be the maximum of the precision of the two arguments and 10.

Back to [ 14.2 Numeric precision and range | 14. mathematical_operations | Top of Page ]

14.3 Mathematical functions

The San math library includes:
the trigometric functions, sin, cos, tan, sec, csc, cot;
the inverse functions asin, acos, atan, asec, acsc, acot, and atan2;
the log functions, ln, log2, and log10;
the exponent functions, exp, exp2, exp10, and pow;
the root functions, sqrt and cubrt.

Other functions and required accuracy are yet to be determined.

Back to [ 14.4 Mathematical functions | 14. mathematical_operations | Top of Page ]


Return to the top of the page.

15. Exception handlers

15.1 User generated exceptions
15.2 Catching exceptions at the proc level
15.3 Handling exceptions with on unitsl

There are two types of exceptions, system generated exceptions and user generated exceptions. System generated exceptions occur when there an execution process attempts to perform an invalid operation. Examples include trying to divide by zero and trying to read from an unopened file. The names of system generated exceptions are fixed by the implementation.

Exceptions can be handled at either of two levels, either by a catch block at the level in which the exception is raised, or by an "on exception" unit at the top level of the agent in which the exception occurs.

15.1 User generated exceptions

User generated exceptions are created with the raise command. Here is an example of raising a user exception:
    If (n Le? 0) Raise not_positive
In this example not_positive is a user generated exception. The names of user generated exceptions can be any legal identifiers. Exceptions are bound to objects within the local scope. Assignments can be made to the exception name in advance. For example we might say
    If (n Le? 0)
        not_positive = n
        Raise not_positive
        End
In our example we could also use n as an exception name. Thus
    if (n Le? 0) Raise n
We could also have defined a proc for non_positive and invoked it in the raise command, e.g.,
    Proc non_positive
        Self.bad = Argv[1]
        End
    ...
    If (n Le? 0) Raise <not_positive n>
In this example invoking non_positive sets the .bad field of non_positive to be the value of n.

Back to [ 15.1 User generated exceptions | 15. Exception handlers | Top of Page ]

15.2 Catching exceptions at the proc level

Catch blocks are defined within procs. The definition for a catch block must list the exceptions that it will catch. There is no default that catches everything. If there is more than catch block that can handle the exception the first in lexical order has priority.

Typically a catch block will perform some sort of fixup and/or error logging. The last thing it does is to specify how control is to be resumed. The choices are:
Continue This says to continue on from the point at which the exceptions was raised.
Escape This says to escape the block in which the exception was raised. The escape keyword may have an argument as a label; if it does the labelled block is escaped.
Return This says to return from the current proc with the exception handled.
Parent This says to return from the current proc with the exception not handled.
Agent This says to pass the exception up to "on exception" units at the agent's top level..
The choice is made using the Action: command, e.g.,

    Catch not_positive
        errmsg "n is not positive in {'name}"
        Action: return
        End catch
Failure to provide a terminating action creates a Mishandled_exception" exception causing transfer to an on exception unit at the agent level.

Catch blocks are independent blocks of code; it follows that there can be no transfers out of a catch block other than proc invocations and raising exceptions. When an exception is raised within a catch block exception handling reverts to the agent level. This does not apply to exceptions raised within procs invoked within the catch block.

Back to [ 15.2 Catching exceptions at the proc level | 15. Exception handlers | Top of Page ]

15.3 Handling exceptions with on units

On unit exception handlers are defined at the uppermost level of an agent, i.e., they are never active within procs. The syntax for an on exception unit is:
    On Exception <exception_list>
        <exception_handler_body>
        End
The <exception_list> lists the exceptions that the handler will catch. If the <exception_list> is empty the handler will catch all exceptions. This is opposite from catch blocks.

On unit exception handlers are similar to catch blocks in that they typically contain fixup code and error logging, with a terminal action command. However the permissable actions for an on unit exception handler are different. They are:
Continue This says to continue on from the point at which the exceptions was raised.
Return This says to return from the proc in which the exception was raised.
Terminate This says terminate execution of the response block in which the exception was raised.
Abort This suspends operation of the agent and sends a message to the parent agent. If there is no parent it initiates program shutdown.
If an exception is raised during the execution of an on unit exception handler an abort action is triggered. Likewise the default action is an abort action.

Back to [ 15.3 Handling exceptions with on units | 15. Exception handlers | Top of Page ]


Return to the top of the page.

16. Configuration Segments

The objectives of a configuration section are to provide (a) feature based configuration management for the source code, and (b) to provide access to resources external to the source code, said resources being parameters and definitions.

Resources are delivered to code segments via blocks called dictionaries. In the beginning of a code segment there can be dictionary statements, for example:

    Code codicil
        Dictionary: fumble
        Dictionary: mumble
        ...
        #code definitions
        End segment
When a definition for an identifier is needed the following steps are taken: First the local defintion block scope is searched for a definition. If none is found there, the outer level of the code segment is searched for a definition. If none is found there the dictionaries are searched in turn.

The dictionary names in code segments are symbolic names having location addresses that are specified in the governing configuration section. Each such address specifies the location of a dictionary block. For example the governing configuration section might have a statement somewhere of the form:

        fumble: c:/san/syscon/math.san#config.dictionaries.trig
This is an example of a location address. The first part of the name is a file system path name. The second part after the # character is a component path name. The first part of the component path name is the name of a configuration segment. Subsequent parts are successive sub-blocks. In this case math.san looks something like this:
    ...
    Configuration, label=config
        ...
        Dictionaries
            ...
            Dictionary trig
                 ...
                 End trig
            ...
            End dictionaries
        ...
        End config
The contents of a dictionary are pairs of terms, the first term being an identifier and the second being the path where it can be found. In our example dictionary trig might look something like this:
    Dictionary label=trig
        ...
        sin: #functions.trig
        cos: #functions.trig
        ...
        End trig
What these entries say is that the definitions for sin and cos can be found by looking in the current file for a code block labelled "functions" and then within that for the definition of an object called trig, e.g, we would find some code like this in math.san.
    Code functions
        ...
        Object trig
            ...
            Proc sin
                ...
                End sin
            ...
            End trig
        ...
        End functions
This then is the general scheme: Code contains names of dictionaries; configurations supply path names to be used to locate the dictionaries; dictionaries contain terms; entries for terms contain the path of the code where the term is defined.

An apparent problem with this scheme is that path names are hardwired, a state of affairs that is generally considered to be undesirable. However we can avoid hardwiring path names by using the text substitution operator, which is delimited by braces, {}. Thus the trig dictionary could have been written:

    Dictionary trig
        ...
        sin: {functions}.trig
        cos: {functions}.trig
        ...
        End trig
The text substitution operator operates at the lexical level. The arguments can be any mixture of literals and variables. In configuration segments all identifiers are bound to variables containing strings. When the text substitution operator is applied to a list of arguments the variables are replaced by their contents (literals are not altered except that numeric literals are treated as strings) and the argument list is concatenated together. For example, we might have something like this:
        fixedprecision: math.fixedprecision
        functions: {fixedprecision ".functions"}
In our example the variable, fixedprecision, is replaced by its content, math.fixedprecision, and it and the literal ".functions" are concatenated together producing math.fixedprecision.functions. Instead of placing the literal inside the braces and quoting it, we could have placed it outside the braces as follows:
        functions: {fixedprecision}.functions
The rule here is that when text substitution operators are embedded in an identifier expression, the identifier string is determined by performing the substitution and then constructing the identifier.

Text substitution operations can be nested. The rule is that when the inner substitution has been performed the resulting string can be treated as an identifier. Here is an example:

        foo: bar
        bar: bagel
        samba: {{foo}}dorf 
In this example {foo} is replaced by bar and then {bar} is replaced by bagel; then bagel is concatenated with dorf to produce a value for samba of bageldorf.

Chained substitution can be used to eliminate hardwired references and to avoid having multiple instances of the same data. However it does not provide a way to effect conditional code.

San configuration files supply and use configuration variables called selectors. A selector may represent a feature, a hardware platform, a vendor, a version of the software, or, in general, any selectable configuration decision.

Selectors can be turned on and off with set and unset statements. The settings under the control of a selector are specified in selection blocks. A selection block can contain set and unset statements and variable value assignments. Here is an example of a selection block for selector foo.

    Selection foo
        Set alpha
        Set beta
        Unset gamma
        pi: 3.14169265358979
        End foo
The statements within a selection block must not be in conflict, i.e., selectors may not be both set and unset in a block, nor may variables be given inconsistent values. Text substitution also applies here. For example, we might have,
    Selection {bar}
        Set {bar}bell
        End
The argument for the selection can be a boolean expression in multiple variables, where set is true and unset is false. Here is an example:
    Selection ~ (foo & bar)
        snark: boojum
        End selection
In this example the '&' is the logical "and" and '~' is logical negation. The selection is made if at least one of foo and bar is unset. The variable "snark" gets the contents of the variable "boojum".

The thought, then, is to use the selection blocks to determine the values of configuration variables; in turn these determine where the dictionaries are. In turn the dictionaries point to where the code definitions are.

There are three kinds of configuration statements (not counting blocks). We have already discusse