table of contents
Computer Science
September 2003

Yet another scripting language - parts 5, 6, and 7

This article is a continuation of the discussion of "yet another scripting language", a scripting language that I am proposing to create. The ongoing presentation is broken up into sections, each section discussing some aspect of the proposed language.

The articles are not definitive specifications; rather, they are working documents exploring possibilities.

There are five prior threads in comp.lang.misc that are devoted to this projected language. The thread titles are:

Yet Another Scripting Language - Syntax thoughts
Yet another scripting language - arithmetic
Yet another scripting language - formalizing flow control
Yet another scripting language - Variables, function invocation, and some syntax
Yet Another Scripting Language - Reprise
There are three web pages that summarize my thoughts on said topics at the end of said threads. These pages are:
http://richardhartersworld.com/cri/2001/newlang01.html http://richardhartersworld.com/cri/2001/newlang02.html http://richardhartersworld.com/cri/2003/newlang03.html
This article has three sections, sections 5, 6, and 7. Section 5 is a potpourri of minor updates. Section 6 is a major rewrite of the material about procedures, functions, and other executable routines. Section 7 has initial material about file structure and some illustrative sample code.

The current interim name for this language is San. Suggestions for a better name are solicited.

5.0 Miscellaneous updates

5.0.1 Indentation

All blocks must begin with the keyword "begin". The content within the block must be indented. The block must be terminated with an "end" statement that, with the single exception of comment blocks, must also be indented.

Preferably (but not obligatorily) all statements within the block that are not part of an internal block should have the same indentation. However no statement shall be indented deeper than any begin statement with the block.

The indentation characters within a block may be either spaces or tabs but not a mixture.

The "end" statement need not have an argument; if it does the one and only argument may be the name of block type, the name of an entity being defined, or a label supplied in the corresponding "begin" statement. For example the statement

        begin loop ; label = foo
may be terminated by any of these
        end loop
        end foo
Similarly the definition block
        begin function foo  
may be terminated by any of these
        end function
        end foo

5.0.2 Comments, comment function, and comment blocks

Putting the '#' character as the first non-white-space character in a line makes the entire line a comment. The '#' character is also the name of the comment function, which may appear anywhere in the program text. A comment block starts with 'begin' with either '#' or 'comment' as an argument, contains an indented block of comments, and ends with the corresponding end statement. For example:
        begin #
                This is a comment.
                It is only a comment.
                If it were code it would
                Still be a comment.
        end #
Note: The terminator for the comment block is NOT indented. This is the only exception to the indented end rule.

5.0.3 Character string representations

San supports 4 (!)different ways to represent character strings. They are:

(1) Single characters using ': A token consisting of two characters with the first character being a single quote is a literal representing the second character. Thus 'x is a literal representing a string consisting of the character x.

(2) Strings using ": Tokens with a leading double quotation mark are literal strings (no trailing " is needed.) Thus "foo is the string "foo". If the string is not enclosed in braces {} or angle brackets <> the string is terminated by white space.

If it is so enclosed white space is part of the text. Thus {"foo bar} is the string "foo bar".

(3) Strings using count format: Tokens beginning with a integer followed by the letter H (case insensitive) are strings consisting of specified number of characters after the 'H'. Thus 3Hfoo is the string "foo". All special characters including white space within the counted number of characters are treated literally. Thus the string 4H{" < is the four characters, {, ", space, and <.

(4) Text blocks: Text blocks are useful for text running over more than one line. The format is:

    begin text NAME
where NAME is the name of the variable containing the text, and BODY is one or more lines of text. Each line has a leading character (after the indentation) that may either be | or :. Lines beginning with : are taken literally as is - no text substitution is performed. Text substitution is done in lines that begin with |. For example:
    begin text blatifu
        |Dear {addressee}
        |   Please find enclosed a check for {amount} lire.
        .Ralph 124c41+  {:-}
Text substitution is done on <addressee} and {amount} but not on {:-}.

5.0.4 Infix operators

Functions with two arguments can be converted into an infix operation by appending a question mark to the function name. For example
        <lt x y>        <# Yields true if x is less than y>
is equivalent to
        (x lt? y)       <# Also yields true if x is less than y> 
In general (i.e., I haven't spelled out rules yet) missing arguments are filled out from the enclosing scope. The two principal cases, perhaps the only cases, are in switch blocks and in sequent operators. In switch blocks incomplete delimited relationship expressions are filled in with switch selectors. Example:
        begin switch x
                lt? y => action1 <# when x lt? y perform action1> 
                z gt? => action2 <# when z gt? x perform action2> 
                end switch
When relationships are used as sequent operators one element will be absent; the operator is a filter that passes elements satisfying the relationship. Example:
        x[] <- [1...n | gt? m]

5.0.5 Sequent operators

Sequents now look like this:
        [source | op ...]
For example,
        [1...100 | step 2 | lt? 10]
produces the sequence [ 1, 3, 5, 7, 9]

Logical operators act as filters; the step operator applies a comb function; arithmetic operators apply the operation to each element passed to them. Example:

        [1...5 | * 3 | + 1 ]
produces the sequence
        [4, 7, 10, 13, 16]
The resemblence to unix pipes is not entirely accidental.

5.0.6 Concatenation

The caret (^) is the concatenation operator. It may either be used as an infix operator or as a function. For example, if
        foo := "bagel
        bar := "dorf

        blatifu := foo ^ bar
        blatifu := <^ foo bar>
are alternate ways to set the content of blatifu to bageldorf.

5.0.7 Three types of qualified names

There are three types of qualified names with three different separating characters, money sign ($), the period (.), and the exclamation point (!). San variables (morphs) have two types of fields, irregular, and regular. The money (dollar) sign is used to qualify irregular names, e.g., foo$nrow is the number of rows in the foo table. Likewise the period (dot) is used to to qualify regular fields, e.g., foo[i].a is the table entry for row i, column a.

The exclamation point (bang) is specialized. The second name is taken as a procedure that accepts the first name as an argument. The fields of the procedure that would be set as the result of the call are set in the argument instead. For example, suppose procedure mysort sorts the array of its argument and then sets its own array to be the sorted array. We could use the sort procedure as follows:

        sort data
        data[] <- sort[]
However we can do the same thing more compactly with

Bang separators can be concatenated.  Thus

is equivalent to

        alpha data
        beta alpha
        gamma beta
        delta gamma
        data[] <- delta[]
Bang expressions take no argument. The regular fields of the final procedure are transferred back to the original morph.

5.0.8 Null function

A pair of angle brackets without enclosed content, e.g., <>, is a null function. Thus the code
        x := <>
sets the value of x to be empty.

6.0 Delimited executable code aggregates (routines)

The term, delimited executable code aggregate, is certainly clumsy, but I don't know of a good alternative. Terms such as "procedure" and "function" have narrow connotations due to their use in mainstream programming languages. For the sake of convenience I will use the term "routine".

What, then, is a routine?

A routine is a block of code that can be bound to an associated identifier, and that can accept control (can be invoked, activated, or equivalent). Routines include functions, procedures, coroutines, tasks, event handlers, data flow components, and goto acceptance blocks.

Routines can be distinguished between those that are activated with the call/return scheme (traditional functions and procedures) and those event activated (event handlers, exception handlers, data flow elements). Likewise they can be distinguished between those that can be resumed and those that cannot. Finally, we can distinguish between those with persistent data and those without (reentrent and non-reentrent.)

San recognizes four kinds of routines - functions, procedures, sequent (list) operators, and agents.

6.1 Functions and procedures

Functions and procedures are the work horses of procedural languages. The two forms are closely related, and many languages fuzz the two together.

In practice the two forms are distinct; functions return values that can appear in expressions whereas procedures do not. An example of a function invocation is

        x := <cos theta>        <# San syntax>
        x  = cos(theta);        /* C syntax */
An example of a procedure invocation is

        sort data               <# San syntax >
        sort(data);             /* C syntax */
In San functions cannot alter variables in their calling environment, i.e., function invocations use call by copy. On the other hand procedures can alter variables in their calling environment, i.e., procedure invocations use call by reference. Having two different modes of argument semantics is a possible source of confusion; however call by reference (or call by address) is a practical necessity for procedures.

6.1.1 Handling of arguments in procedures and functions

When procedures and functions are defined the arguments are specified in an arguments statement. Arguments can have default values specified in init statements. For example:
        begin procedure foo
                arguments x, y
                init x = 3.14159
                init y = 180.
                end foo
There are two distinct ways to supply arguments when invoking procedures and functions. The two methods may not be mixed. The first method is positional correspondence. For example
        foo a b
The second method is to use name=value pairs, e.g.,
        foo x=a, y=b
When the name=value method is used order need not be preserved and arguments can be omitted, e.g.,
        foo y=b
The omitted arguments have the default values specified in the init statements.

6.1.2 Creation of procedures and functions

Procedures and functions are defined using proc blocks. For example:
        begin proc factorial
                arguments n
                n le? 1 => return 1
                else    => return n * <factorial n - 1>
                end proc
A proc can either be used as a procedure or as a function.

6.2 Dataflow programming

In dataflow progamming programs are views as being a collection of black boxes (data flow elements) connected by pipes (streams). A data flow element is a routine with 0 or more input ports and 0 or more output ports. Input ports accept data from input streams; output ports emit data to output streams. Data flow elements are activated whenever there is input on an input port. Block markers to delimit blocks can be accepted and emitted.

Data flow elements that have outputs but no inputs are called sources; sources include external devices, external files, and generators such as random number generators.

Data flow elements that have inputs but no outputs are called sinks; sinks include external devices, e.g., printers, and files that are written to.

In San input ports are labelled $0, $1, etc, and output ports are labelled $out0, $out1, etc. The current value in an input port is referenced by the port name, e.g., $0. The content of remainder of a block is referenced by appending brackets, e.g., $0[].

Emitting the contents of an input port advances the input. The emit0 command sends data to $out0; emit1 sends data to $out1, etc. Likewise emitting the content of a morph clears the content.

In San streams can be blocked. The block separators, called marks, are not special elements in streams; they must be emitted and detected using separate commands distinct from the commands used to read and write streams.

6.3 Sequent operators

Sequent operators are special cases of data flow elements that satisfy these conditions:

There are two input ports, $0 and $1, and one output port $out0. Inputs are presumed to be blocked. When invoked in a sequent $in0 comes from the piped input and $in1 comes from the argument list. $out0 is sent to the output pipe.

Here are examples of sequent operator programming. The first routine is an implementation of quick sort; the second routine is an implementation of merge sort.

    begin operator qsort
        pivot   :=  $0
        small[] := [$0[] | lt? pivot | qsort]
        big[]   := [$0[] | gt? pivot | qsort]
        same[]  := [$0[] | eq? pivot        ]
        emit0 [small[], same[], big[]]
        end qsort

    begin operator msort
        begin operator merge
            begin switch
                <empty?  $0> => emit $1[]
                <empty?  $1> => emit $0[]
                $0 lt? $1    => emit $0
                else         => emit $1
            end merge
        cut $0[] $0.$nrow/2
        emit [[cut.head[] | msort]| merge [cut.tail[] | msort]]
        end msort

6.4 Agents

An agent is an executable code element (routine) that is an independent thread of control, i.e., a mini-thread. Procedures (and functions) are invoked. Agents cannot be invoked; they can be sent messages, receive input on input ports, handle events and exceptions, and be activated as programs or tasks.

Agents have two types of code - initialization (setting of parameters) and response blocks (on codes). Response blocks are activated when a specific condition arises. Response block initiators have the form on-TYPE where TYPE is the type of event being handled. The types include:

     on-activation      executed then the agent is first
     on-$N-MODE         executed when there is input on
                        channel N.  MODE specified 
                        whether the channel is being read
                        in char, word, or word list mode.
     on-markN           executed when there is a blocking
                        mark received on channel N
     on-exception       executed when an exception is
                        raised within the agent.  An
                        on-exception block with arguments
                        handles exceptions names in the
                        argument list; a block without
                        arguments handles all exceptions.
     on-msg             executed for each received

7.1 File Structure

A San file is divided into segments, consisting of a segment statement followed by an indented body. The segment statement has two arguments, the type, and the label. There are at least three types of segments, executives, configurators, and modules

Modules contain actual code. In addition to executable code within a module, there may also be import and export statements. Export statements make the exported items visible outside the module. Import statements specify required external resources. The names in inport statements are the names used locally within the module; connections between the module local names and the resources matched to the local name are specified in configurators.

The executive specifies which agent will be used as main program and which configurator will be used to specify the configuration. The program name can be a symbolic name; the selected configurator translates the symbolic name into an actual path, module, and named agent. Symbolic names can be used within configurators, and a configurator can reference another configurator.

7.2 Sample Program: A simple desk calculator

The following code would be the contents of a file that implements a simple command line desk calculator. It illustrates the use of agents. Each line of input updates a running total; the first token in the line is an arithmetic operator, with the remaining tokens being number. For each number, the total is updated by applying the operator and the number to the total. A 'C' resets the total to 0. For example:
calc: + 2 3 5
calc: * 4 9
calc: / 15
calc: C

begin segment type = executive, label = exec
   program      = desk-calculator
   configurator = config
   end exec
begin segment type = configurator, label=config
   desk-calculator = <segpath . deskcalc desk-calculator>
   end config 
begin segment type = module, label = deskcalc
   init clear := 'C
   init ops[] := ['+, '-, '*, '/, clear]
   init wsp[] := [\t , \sp]
   begin agent desk-calculator
      msg to=print type=prompt
      $0 | getline | lex | validate | calc
   begin agent getline
      on-in0-char begin switch
         $0 ceq? \n   => mark0
         else         => emit0-char $0
   begin agent lex
      on-in0-char begin switch
         $0 in? wsp[] => emit0-word word
         else         => word := {word}{$0}
      on-mark0 mark0
   begin agent validate
      init okay  := true
      init new   := true
      on-in0-word begin switch $0
        ~ okay     => <>
        new        => begin
                        begin switch
                           $0 nin? ops[] => raise bad-command $0
                           else          => command := $0
                        new := false
        ~ number?        => raise not-a-number $0
        ne? 0            => append $0 to args
        command ceq? '/  => raise divide-by-zero 
        else             => append $0 to args
      on-mark0 begin
         okay    => emit0-list command, args[]
         okay   := true
         new    := true
      on-exception begin 
        begin switch $exception
          bad-command    => begin
            msg to=print type=string {S.{$arg} is not a command}
          divide-by-zero => begin
            msg to=print type=string {S.Can't divide by zero}
          not-a-number   => begin
            msg to=print type=string {S.{$arg} is not a number}
        msg to=print type=prompt
        okay := false
   begin agent calc
      init total 0
      on-in0 begin
         $0[] -> command, args[] 
         begin switch
            command ceq? clear => total := 0  
            else               => begin loop value from args[]
               total := total {command} value
         msg to=print type=string {total}
   begin agent print
      on-msg begin switch $type
         ceq? string => puts {$arg}{\n}
         ceq? prompt => puts {"calc: }
   end deskcalc    

This page was last updated September 12, 2003.

table of contents
Computer Science
September 2003